论文标题
具有稀疏和延迟奖励的机器人任务的自我象征学习
Self-Imitation Learning for Robot Tasks with Sparse and Delayed Rewards
论文作者
论文摘要
在机器人控制中的加固学习(RL)在具有稀疏和延迟奖励的环境中仍然受到限制。在本文中,我们提出了一种实用的自我象征学习方法,名为“自我模拟学习”,并持续奖励(SILCR)。我们的方法不需要从环境中获得手工定义的即时奖励,而是根据每个时间步中的即时奖励,并根据其最终情节奖励的恒定值分配。这样,即使无法获得环境的密集奖励,代理商采取的每一个动作都将得到适当的指导。我们证明了我们方法在一些挑战性的连续机器人控制任务中的有效性,结果表明,我们的方法在稀疏和延迟奖励的任务中大大优于替代方法。即使与可获得密度奖励的替代品相比,我们的方法也可以达到竞争性能。消融实验还显示了我们方法的稳定性和可重复性。
The application of reinforcement learning (RL) in robotic control is still limited in the environments with sparse and delayed rewards. In this paper, we propose a practical self-imitation learning method named Self-Imitation Learning with Constant Reward (SILCR). Instead of requiring hand-defined immediate rewards from environments, our method assigns the immediate rewards at each timestep with constant values according to their final episodic rewards. In this way, even if the dense rewards from environments are unavailable, every action taken by the agents would be guided properly. We demonstrate the effectiveness of our method in some challenging continuous robotics control tasks in MuJoCo simulation and the results show that our method significantly outperforms the alternative methods in tasks with sparse and delayed rewards. Even compared with alternatives with dense rewards available, our method achieves competitive performance. The ablation experiments also show the stability and reproducibility of our method.