通过任务成就，注射干扰的强大模仿学习

论文标题

通过任务成就，注射干扰的强大模仿学习

Disturbance-Injected Robust Imitation Learning with Task Achievement

论文作者

Tahara, Hirotaka, Sasaki, Hikaru, Oh, Hanbit, Michael, Brendan, Matsubara, Takamitsu

论文摘要

使用干扰注射的稳健模仿学习克服了示范差异有限的问题。但是，这些方法假定演示是最佳的，并且可以通过简单的增强来学习政策稳定。在实际情况下，示威通常是多种质量的，而干扰注入则是学习未能复制所需行为的亚最佳政策。为了解决这个问题，本文提出了一个新颖的模仿学习框架，该框架结合了策略鲁棒化和最佳演示学习。具体而言，这种组合方法迫使政策学习和干扰注入优化，主要专注于从高任务成就演示中学习，同时利用低成就的示范来减少所需的样本数量。通过在模拟和真实机器人中使用挖掘任务的实验来验证所提出的方法的有效性，从而产生了高成就的策略，这些策略对各种质量的演示更稳定，更稳定。此外，此方法利用所有加权的亚最佳示范而没有消除它们，从而带来了实际的数据效率益处。

Robust imitation learning using disturbance injections overcomes issues of limited variation in demonstrations. However, these methods assume demonstrations are optimal, and that policy stabilization can be learned via simple augmentations. In real-world scenarios, demonstrations are often of diverse-quality, and disturbance injection instead learns sub-optimal policies that fail to replicate desired behavior. To address this issue, this paper proposes a novel imitation learning framework that combines both policy robustification and optimal demonstration learning. Specifically, this combinatorial approach forces policy learning and disturbance injection optimization to focus on mainly learning from high task achievement demonstrations, while utilizing low achievement ones to decrease the number of samples needed. The effectiveness of the proposed method is verified through experiments using an excavation task in both simulations and a real robot, resulting in high-achieving policies that are more stable and robust to diverse-quality demonstrations. In addition, this method utilizes all of the weighted sub-optimal demonstrations without eliminating them, resulting in practical data efficiency benefits.

下载PDF全文

下载文献需遵守相关版权规定

论文标题