论文标题
FORESEE:通过扩展压缩的预测无意义的转换进行在线政策优化
FORESEE: Prediction with Expansion-Compression Unscented Transform for Online Policy Optimization
论文作者
论文摘要
已知通过通用,不确定的非线性动力学模型传播状态分布是棘手的,通常会出现数值或分析近似值。我们介绍了一种用于状态预测的方法,称为“扩展压缩”无意义的转换,并使用它来解决一类在线策略优化问题。我们提出的算法通过状态依赖性分布传播有限数量的Sigma点,这决定了每个时间步骤中的Sigma点数量的增加,以表示所得分布。这就是我们所说的扩展操作。为了保持算法可扩展,我们根据矩匹配来扩展按压缩操作扩展操作,从而在多个时间步长上保持整个预测的Sigma点的数量。从经验上证明它的性能与蒙特卡洛相当,但计算成本要低得多。在状态和控制输入限制下,随后将状态预测与建议的梯度降低的拟议变体一起使用,以以退缩的地平线方式在线更新策略参数。该框架被作为用于策略培训的可区分计算图。我们展示了四型稳定任务的框架,这是安全控制 - gym中基准比较的一部分,并在领导者追随者问题中优化了基于控制屏障函数的控制器的参数。
Propagating state distributions through a generic, uncertain nonlinear dynamical model is known to be intractable and usually begets numerical or analytical approximations. We introduce a method for state prediction, called the Expansion-Compression Unscented Transform, and use it to solve a class of online policy optimization problems. Our proposed algorithm propagates a finite number of sigma points through a state-dependent distribution, which dictates an increase in the number of sigma points at each time step to represent the resulting distribution; this is what we call the expansion operation. To keep the algorithm scalable, we augment the expansion operation with a compression operation based on moment matching, thereby keeping the number of sigma points constant across predictions over multiple time steps. Its performance is empirically shown to be comparable to Monte Carlo but at a much lower computational cost. Under state and control input constraints, the state prediction is subsequently used in tandem with a proposed variant of constrained gradient-descent for online update of policy parameters in a receding horizon fashion. The framework is implemented as a differentiable computational graph for policy training. We showcase our framework for a quadrotor stabilization task as part of a benchmark comparison in safe-control-gym and for optimizing the parameters of a Control Barrier Function based controller in a leader-follower problem.