关于动态输出反馈的优化格局：线性二次调节器的案例研究

论文标题

关于动态输出反馈的优化格局：线性二次调节器的案例研究

On the Optimization Landscape of Dynamic Output Feedback: A Case Study for Linear Quadratic Regulator

论文作者

Duan, Jingliang, Cao, Wenhan, Zheng, Yang, Zhao, Lin

论文摘要

策略梯度算法在加强学习中的融合取决于基础最佳控制问题的优化格局。通常可以通过分析线性二次控制的理论见解来获取这些算法。但是，大多数现有文献仅考虑静态全州或输出反馈策略（控制器）的优化格局。我们研究了线性二次调节（缩写为DLQR）的动态输出反馈策略的更具挑战性的案例，该策略在实践中很普遍，但具有相当复杂的优化景观。我们首先显示DLQR成本如何随动态控制器的坐标转换而变化，然后为给定可观察的稳定控制器得出最佳转换。我们结果的核心是可观察到DLQR的固定点的唯一性，这是基于观察者的控制器的简洁形式，具有最佳的相似性转换。这些结果阐明了设计有效的算法，以解决部分观察到的信息。

The convergence of policy gradient algorithms in reinforcement learning hinges on the optimization landscape of the underlying optimal control problem. Theoretical insights into these algorithms can often be acquired from analyzing those of linear quadratic control. However, most of the existing literature only considers the optimization landscape for static full-state or output feedback policies (controllers). We investigate the more challenging case of dynamic output-feedback policies for linear quadratic regulation (abbreviated as dLQR), which is prevalent in practice but has a rather complicated optimization landscape. We first show how the dLQR cost varies with the coordinate transformation of the dynamic controller and then derive the optimal transformation for a given observable stabilizing controller. At the core of our results is the uniqueness of the stationary point of dLQR when it is observable, which is in a concise form of an observer-based controller with the optimal similarity transformation. These results shed light on designing efficient algorithms for general decision-making problems with partially observed information.

下载PDF全文

下载文献需遵守相关版权规定

论文标题