论文标题
通过任务部门的多任务模仿学习的模块化自适应策略选择
Modular Adaptive Policy Selection for Multi-Task Imitation Learning through Task Division
论文作者
论文摘要
深度模仿学习需要许多专家演示,这可能很难获得,尤其是在涉及许多任务的情况下。但是,不同的任务通常具有相似之处,因此共同学习它们可以极大地使他们受益,并减轻对许多示威的需求。但是,联合多任务学习通常遭受负转移的损失,共享应该特定于任务的信息。在这项工作中,我们引入了一种执行多任务模仿的方法,同时允许特定于任务的功能。这是通过将原始核心用作模块将任务分为可以共享的简单子行为的模块来完成的。原始预利并联起作用,并由与模块共同训练的选择器机制自适应选择。不同任务集的实验表明,我们的方法提高了单个代理,任务条件和多任务多任务代理以及最先进的元学习剂的准确性。我们还证明了其自主将任务分为共享和特定于任务的子行为的能力。
Deep imitation learning requires many expert demonstrations, which can be hard to obtain, especially when many tasks are involved. However, different tasks often share similarities, so learning them jointly can greatly benefit them and alleviate the need for many demonstrations. But, joint multi-task learning often suffers from negative transfer, sharing information that should be task-specific. In this work, we introduce a method to perform multi-task imitation while allowing for task-specific features. This is done by using proto-policies as modules to divide the tasks into simple sub-behaviours that can be shared. The proto-policies operate in parallel and are adaptively chosen by a selector mechanism that is jointly trained with the modules. Experiments on different sets of tasks show that our method improves upon the accuracy of single agents, task-conditioned and multi-headed multi-task agents, as well as state-of-the-art meta learning agents. We also demonstrate its ability to autonomously divide the tasks into both shared and task-specific sub-behaviours.