论文标题
终身加固学习,并调制口罩
Lifelong Reinforcement Learning with Modulating Masks
论文作者
论文摘要
终身学习旨在创建与生物学习相似的一生中不断和逐步学习的AI系统。到目前为止的尝试已经解决了问题,包括灾难性的遗忘,任务之间的干扰以及无法利用以前的知识。尽管大量的研究集中在学习涉及输入分布的变化的多个监督分类任务上,但终生加强学习(LRL)必须处理国家和过渡分布的变化以及奖励功能。用固定的骨干网络调制面具,最近开发用于分类,特别适合处理如此大的任务变化。在本文中,我们改编了调节口罩以与深度LRL(特别是PPO和Impala剂)一起使用。在离散和连续RL任务中与LRL基准的比较显示出卓越的性能。我们进一步研究了在学习新任务时使用先前学到的面具的线性组合来利用以前的知识:不仅学习更快,该算法还解决了由于非常稀疏的奖励,我们无法求解这些任务。结果表明,通过调节口罩的RL是一种终身学习的有前途的方法,学习知识以学习越来越复杂的任务以及知识重复使用,以进行有效,更快的学习。
Lifelong learning aims to create AI systems that continuously and incrementally learn during a lifetime, similar to biological learning. Attempts so far have met problems, including catastrophic forgetting, interference among tasks, and the inability to exploit previous knowledge. While considerable research has focused on learning multiple supervised classification tasks that involve changes in the input distribution, lifelong reinforcement learning (LRL) must deal with variations in the state and transition distributions, and in the reward functions. Modulating masks with a fixed backbone network, recently developed for classification, are particularly suitable to deal with such a large spectrum of task variations. In this paper, we adapted modulating masks to work with deep LRL, specifically PPO and IMPALA agents. The comparison with LRL baselines in both discrete and continuous RL tasks shows superior performance. We further investigated the use of a linear combination of previously learned masks to exploit previous knowledge when learning new tasks: not only is learning faster, the algorithm solves tasks that we could not otherwise solve from scratch due to extremely sparse rewards. The results suggest that RL with modulating masks is a promising approach to lifelong learning, to the composition of knowledge to learn increasingly complex tasks, and to knowledge reuse for efficient and faster learning.