论文标题
部分可观测时空混沌系统的无模型预测
On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations
论文作者
论文摘要
从专家示范中进行的KL登记的强化学习已被证明成功地提高了深入增强学习算法的样本效率,从而使它们适用于挑战物理现实世界中的任务。但是,我们表明,通过专家示威的行为参考政策进行的KL规范化的增强学习可能会遭受病理训练动态的影响,这些动态可能会导致缓慢,不稳定和次优的在线学习。我们从经验上表明,这种病理是针对常见的行为政策类别发生的,并证明了其对样本效率和在线政策绩效的影响。最后,我们表明该病理可以通过非参数行为参考策略来补救,这允许KL规范化的强化学习可以显着超过各种具有挑战性的运动和灵活的手动操纵任务的最先进方法。
KL-regularized reinforcement learning from expert demonstrations has proved successful in improving the sample efficiency of deep reinforcement learning algorithms, allowing them to be applied to challenging physical real-world tasks. However, we show that KL-regularized reinforcement learning with behavioral reference policies derived from expert demonstrations can suffer from pathological training dynamics that can lead to slow, unstable, and suboptimal online learning. We show empirically that the pathology occurs for commonly chosen behavioral policy classes and demonstrate its impact on sample efficiency and online policy performance. Finally, we show that the pathology can be remedied by non-parametric behavioral reference policies and that this allows KL-regularized reinforcement learning to significantly outperform state-of-the-art approaches on a variety of challenging locomotion and dexterous hand manipulation tasks.