非线性鲁棒控制的三元政策迭代算法

论文标题

非线性鲁棒控制的三元政策迭代算法

Ternary Policy Iteration Algorithm for Nonlinear Robust Control

论文作者

Li, Jie, Li, Shengbo Eben, Guan, Yang, Duan, Jingliang, Li, Wenyu, Yin, Yuming

论文摘要

对于非线性控制问题，植物动力学的不确定性仍然是一个挑战。本文开发了一种三元政策迭代（TPI）算法，用于通过有限的不确定性解决非线性鲁棒控制问题。系统的控制器和不确定性被视为游戏玩家，并且强大的控制问题被认为是两人零和差异游戏。为了解决差异游戏，随后得出了相应的Hamilton-Jacobi-Isaacs（HJI）方程。设计了三个损耗函数和三个更新阶段，以匹配HJI方程的身份方程，最小化和最大化。这些损失函数是由在生成的状态下对近似哈密顿的期望来定义的，以防止同时在整个状态中运行所有状态。通过使用梯度下降方法减少设计的损失函数，可以直接更新价值函数和策略的参数。此外，零限制可以应用于控制策略的参数。通过两项模拟研究证明了所提出的TPI算法的有效性。仿真结果表明，TPI算法可以收敛到线性植物的最佳溶液，并且对非线性植物的干扰具有很高的抵抗力。

The uncertainties in plant dynamics remain a challenge for nonlinear control problems. This paper develops a ternary policy iteration (TPI) algorithm for solving nonlinear robust control problems with bounded uncertainties. The controller and uncertainty of the system are considered as game players, and the robust control problem is formulated as a two-player zero-sum differential game. In order to solve the differential game, the corresponding Hamilton-Jacobi-Isaacs (HJI) equation is then derived. Three loss functions and three update phases are designed to match the identity equation, minimization and maximization of the HJI equation, respectively. These loss functions are defined by the expectation of the approximate Hamiltonian in a generated state set to prevent operating all the states in the entire state set concurrently. The parameters of value function and policies are directly updated by diminishing the designed loss functions using the gradient descent method. Moreover, zero-initialization can be applied to the parameters of the control policy. The effectiveness of the proposed TPI algorithm is demonstrated through two simulation studies. The simulation results show that the TPI algorithm can converge to the optimal solution for the linear plant, and has high resistance to disturbances for the nonlinear plant.

下载PDF全文

下载文献需遵守相关版权规定

论文标题