多任务深度加强学习中的知识转移以进行连续控制

论文标题

多任务深度加强学习中的知识转移以进行连续控制

Knowledge Transfer in Multi-Task Deep Reinforcement Learning for Continuous Control

论文作者

Xu, Zhiyuan, Wu, Kun, Che, Zhengping, Tang, Jian, Ye, Jieping

论文摘要

尽管深度强化学习（DRL）已成为许多复杂任务的有前途的方法，但培训能够执行多个不同连续控制任务的单个DRL代理仍然具有挑战性。在本文中，我们提出了基于知识转移的多任务深度加强学习框架（KTM-DRL），以进行连续控制，这使一个DRL代理通过向特定于任务的教师学习来实现多个不同任务的专家级绩效。在KTM-DRL中，多任务代理首先利用了专门为演员评价的架构设计的离线知识转移算法，以快速从特定于任务的教师的经验中学习控制政策，然后使用在线学习算法来通过在这些老师的指导下从新的在线过渡样本中学习来进一步改善自身。我们在Mujoco连续控制任务套件中使用两个常用的基准进行了全面的实证研究。实验结果很好地证明了KTM-DRL及其知识转移和在线学习算法的有效性，以及其优越性比最先进的优势。

While Deep Reinforcement Learning (DRL) has emerged as a promising approach to many complex tasks, it remains challenging to train a single DRL agent that is capable of undertaking multiple different continuous control tasks. In this paper, we present a Knowledge Transfer based Multi-task Deep Reinforcement Learning framework (KTM-DRL) for continuous control, which enables a single DRL agent to achieve expert-level performance in multiple different tasks by learning from task-specific teachers. In KTM-DRL, the multi-task agent first leverages an offline knowledge transfer algorithm designed particularly for the actor-critic architecture to quickly learn a control policy from the experience of task-specific teachers, and then it employs an online learning algorithm to further improve itself by learning from new online transition samples under the guidance of those teachers. We perform a comprehensive empirical study with two commonly-used benchmarks in the MuJoCo continuous control task suite. The experimental results well justify the effectiveness of KTM-DRL and its knowledge transfer and online learning algorithms, as well as its superiority over the state-of-the-art by a large margin.

下载PDF全文

下载文献需遵守相关版权规定

论文标题