论文标题
分裂和征服模仿学习
Divide & Conquer Imitation Learning
论文作者
论文摘要
当进入深入的增强学习框架时,许多机器人技术任务需要解决漫长的地平线和稀疏的奖励问题,而学习算法则在困难。在这种情况下,模仿学习(IL)可能是引导学习过程的有力方法。但是,大多数IL方法都需要几个专家示范,这可能是难以获取的。只有少数IL算法显示出可用的专家演示的极低专家数据制度的效率。在本文中,我们提出了一种新型算法,旨在模仿专家轨迹状态的复杂机器人任务。基于连续的电感偏差,我们的方法将复杂的任务分为较小的技能。这些技能被学到了一个具有目标条件的政策,该政策能够单独解决每个技能并解决整个任务。我们表明,我们的方法模仿了非全面的导航任务,并缩放到具有很高样品效率的复杂模拟机器人操纵任务。
When cast into the Deep Reinforcement Learning framework, many robotics tasks require solving a long horizon and sparse reward problem, where learning algorithms struggle. In such context, Imitation Learning (IL) can be a powerful approach to bootstrap the learning process. However, most IL methods require several expert demonstrations which can be prohibitively difficult to acquire. Only a handful of IL algorithms have shown efficiency in the context of an extreme low expert data regime where a single expert demonstration is available. In this paper, we present a novel algorithm designed to imitate complex robotic tasks from the states of an expert trajectory. Based on a sequential inductive bias, our method divides the complex task into smaller skills. The skills are learned into a goal-conditioned policy that is able to solve each skill individually and chain skills to solve the entire task. We show that our method imitates a non-holonomic navigation task and scales to a complex simulated robotic manipulation task with very high sample efficiency.