分裂和征服模仿学习

论文标题

分裂和征服模仿学习

Divide & Conquer Imitation Learning

论文作者

Chenu, Alexandre, Perrin-Gilbert, Nicolas, Sigaud, Olivier

论文摘要

当进入深入的增强学习框架时，许多机器人技术任务需要解决漫长的地平线和稀疏的奖励问题，而学习算法则在困难。在这种情况下，模仿学习（IL）可能是引导学习过程的有力方法。但是，大多数IL方法都需要几个专家示范，这可能是难以获取的。只有少数IL算法显示出可用的专家演示的极低专家数据制度的效率。在本文中，我们提出了一种新型算法，旨在模仿专家轨迹状态的复杂机器人任务。基于连续的电感偏差，我们的方法将复杂的任务分为较小的技能。这些技能被学到了一个具有目标条件的政策，该政策能够单独解决每个技能并解决整个任务。我们表明，我们的方法模仿了非全面的导航任务，并缩放到具有很高样品效率的复杂模拟机器人操纵任务。

When cast into the Deep Reinforcement Learning framework, many robotics tasks require solving a long horizon and sparse reward problem, where learning algorithms struggle. In such context, Imitation Learning (IL) can be a powerful approach to bootstrap the learning process. However, most IL methods require several expert demonstrations which can be prohibitively difficult to acquire. Only a handful of IL algorithms have shown efficiency in the context of an extreme low expert data regime where a single expert demonstration is available. In this paper, we present a novel algorithm designed to imitate complex robotic tasks from the states of an expert trajectory. Based on a sequential inductive bias, our method divides the complex task into smaller skills. The skills are learned into a goal-conditioned policy that is able to solve each skill individually and chain skills to solve the entire task. We show that our method imitates a non-holonomic navigation task and scales to a complex simulated robotic manipulation task with very high sample efficiency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题