多个与上下文有关的任务的自主学习

论文标题

多个与上下文有关的任务的自主学习

Autonomous learning of multiple, context-dependent tasks

论文作者

Santucci, Vieri Giuliano, Montella, Davide, da Silva, Bruno Castro, Baldassarre, Gianluca

论文摘要

在面对通过强化学习系统自主学习多个任务的问题时，研究人员通常专注于解决方案，在解决方案中，每个任务只有一个参数化的策略就足以解决它们。但是，在呈现不同环境的复杂环境中，相同的任务可能需要一组不同的技能才能解决。这些情况提出了两个挑战：（a）认识需要不同政策的不同环境；（b）快速学习在新发现的上下文中完成相同任务的政策。如果面对在开放式学习框架内，这两个挑战甚至更加困难，在该框架内，代理必须自主发现它在给定环境中可能实现的目标，并且还可以学习完成运动技能的目标。我们提出了一种新颖的开放式学习机器人架构C-Grail，该建筑以综合方式解决了这两个挑战。特别是，该体系结构能够根据给定目标的预期性能下降，检测新的相关竞赛，而忽略无关的比赛。此外，体系结构可以通过利用转移学习从已经获得的政策中进口知识来快速学习新环境的政策。在模拟的机器人环境中测试了该体系结构，该机器人涉及一个机器人，该机器人自主学习在存在多个障碍的存在的情况下，在存在多个障碍的情况下达到相关目标对象。所提出的架构优于其他模型，不使用所提出的自主环境 - 发现和转移学习机制。

When facing the problem of autonomously learning multiple tasks with reinforcement learning systems, researchers typically focus on solutions where just one parametrised policy per task is sufficient to solve them. However, in complex environments presenting different contexts, the same task might need a set of different skills to be solved. These situations pose two challenges: (a) to recognise the different contexts that need different policies; (b) quickly learn the policies to accomplish the same tasks in the new discovered contexts. These two challenges are even harder if faced within an open-ended learning framework where an agent has to autonomously discover the goals that it might accomplish in a given environment, and also to learn the motor skills to accomplish them. We propose a novel open-ended learning robot architecture, C-GRAIL, that solves the two challenges in an integrated fashion. In particular, the architecture is able to detect new relevant contests, and ignore irrelevant ones, on the basis of the decrease of the expected performance for a given goal. Moreover, the architecture can quickly learn the policies for the new contexts by exploiting transfer learning importing knowledge from already acquired policies. The architecture is tested in a simulated robotic environment involving a robot that autonomously learns to reach relevant target objects in the presence of multiple obstacles generating several different obstacles. The proposed architecture outperforms other models not using the proposed autonomous context-discovery and transfer-learning mechanisms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题