利用可区分物理模拟中的奖励梯度进行增强学习

论文标题

利用可区分物理模拟中的奖励梯度进行增强学习

Leveraging Reward Gradients For Reinforcement Learning in Differentiable Physics Simulations

论文作者

Gillen, Sean, Byl, Katie

论文摘要

近年来，已经开发了完全可分离的刚体物理模拟器，可用于模拟各种机器人系统。在加强控制的背景下，这些模拟器理论上允许将算法直接应用于奖励功能的分析梯度。但是，迄今为止，这些梯度已被证明是极具挑战性的，并且完全没有使用梯度信息的算法超出了算法。在这项工作中，我们提出了一种新颖的算法，即跨熵分析策略梯度，能够利用这些梯度在一组挑战性的非线性控制问题上超越最先进的深度强化学习。

In recent years, fully differentiable rigid body physics simulators have been developed, which can be used to simulate a wide range of robotic systems. In the context of reinforcement learning for control, these simulators theoretically allow algorithms to be applied directly to analytic gradients of the reward function. However, to date, these gradients have proved extremely challenging to use, and are outclassed by algorithms using no gradient information at all. In this work we present a novel algorithm, cross entropy analytic policy gradients, that is able to leverage these gradients to outperform state of art deep reinforcement learning on a set of challenging nonlinear control problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题