强大的模仿从腐败的示威中学习

论文标题

强大的模仿从腐败的示威中学习

Robust Imitation Learning from Corrupted Demonstrations

论文作者

Liu, Liu, Tang, Ziyang, Li, Lanqing, Luo, Dijun

论文摘要

我们考虑从损坏的演示中学习的离线模仿学习，其中持续的数据可能是噪声甚至任意异常值。诸如行为克隆之类的古典方法假设示威是由大概是最佳专家收集的，因此在从损坏的示威中学习时可能会急剧失败。我们通过最大程度地减少均值（MOM）目标来提出一种新颖的鲁棒算法，即使在存在恒定的异常值的情况下，也可以确保对策略的准确估算。我们的理论分析表明，在损坏环境中，我们的强大方法享有与专家演示环境中的经典行为克隆相同的误差缩放和样本复杂性保证。我们对连续控制基准测试的实验验证了我们的方法表现出预测的鲁棒性和有效性，并且与现有的模仿学习方法相比，我们的方法实现了竞争成果。

We consider offline Imitation Learning from corrupted demonstrations where a constant fraction of data can be noise or even arbitrary outliers. Classical approaches such as Behavior Cloning assumes that demonstrations are collected by an presumably optimal expert, hence may fail drastically when learning from corrupted demonstrations. We propose a novel robust algorithm by minimizing a Median-of-Means (MOM) objective which guarantees the accurate estimation of policy, even in the presence of constant fraction of outliers. Our theoretical analysis shows that our robust method in the corrupted setting enjoys nearly the same error scaling and sample complexity guarantees as the classical Behavior Cloning in the expert demonstration setting. Our experiments on continuous-control benchmarks validate that our method exhibits the predicted robustness and effectiveness, and achieves competitive results compared to existing imitation learning methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题