模仿政策和环境的错误范围

论文标题

模仿政策和环境的错误范围

Error Bounds of Imitating Policies and Environments

论文作者

Xu, Tian, Li, Ziniu, Yu, Yang

论文摘要

通过模仿专家演示，模仿学习训练政策。提出了各种模仿方法并经验评估，同时，他们的理论理解需要进一步研究。在本文中，我们首先通过两种模仿方法，行为克隆和生成对抗性模仿分析了专家政策与模仿政策之间的价值差距。结果支持与行为克隆相比，生成的对抗性模仿可以减少复合误差，从而具有更好的样品复杂性。注意到，通过将环境过渡模型视为双重代理，模仿学习也可以用于学习环境模型。因此，基于模仿政策的界限，我们进一步分析了模仿环境的性能。结果表明，与行为克隆相比，生成对抗性模仿可以更有效地模仿环境模型，这表明了对抗性模仿对基于模型的强化学习的新颖应用。我们希望这些结果可以激发未来的模仿学习和基于模型的增强学习的进步。

Imitation learning trains a policy by mimicking expert demonstrations. Various imitation methods were proposed and empirically evaluated, meanwhile, their theoretical understanding needs further studies. In this paper, we firstly analyze the value gap between the expert policy and imitated policies by two imitation methods, behavioral cloning and generative adversarial imitation. The results support that generative adversarial imitation can reduce the compounding errors compared to behavioral cloning, and thus has a better sample complexity. Noticed that by considering the environment transition model as a dual agent, imitation learning can also be used to learn the environment model. Therefore, based on the bounds of imitating policies, we further analyze the performance of imitating environments. The results show that environment models can be more effectively imitated by generative adversarial imitation than behavioral cloning, suggesting a novel application of adversarial imitation for model-based reinforcement learning. We hope these results could inspire future advances in imitation learning and model-based reinforcement learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题