论文标题
L2Explorer:终生增强学习评估环境
L2Explorer: A Lifelong Reinforcement Learning Assessment Environment
论文作者
论文摘要
尽管机器人技术,游戏玩法和其他复杂领域的强化学习取得了突破性的进展,但在将强化学习应用于在关键应用程序空间中经常发现的不断发展的开放世界问题方面仍然存在重大挑战。强化学习解决方案在接触他们接受过的数据分布之外的新任务时趋于概括,这引起了人们对持续学习算法的兴趣。与持续学习算法的研究同时,需要挑战环境,精心设计的实验和指标来评估研究进度。我们通过使用终生学习资源管理器(L2Explorer)引入一个持续增强学习开发和评估的框架来满足后者的需求,这是一个新的,基于团结的新的,第一人称的3D勘探环境,可以连续重新配置,以生成一系列的任务和任务变体,使其构成复杂且不断发展的评估curric。与具有随机组件的程序生成的世界相反,我们开发了一种系统的方法来定义课程,以响应与随附的指标的受控变化,以评估传输,绩效恢复和数据效率。综上所述,L2Explorer环境和评估方法为开发未来的评估方法提供了一个框架,并严格评估终身学习的方法。
Despite groundbreaking progress in reinforcement learning for robotics, gameplay, and other complex domains, major challenges remain in applying reinforcement learning to the evolving, open-world problems often found in critical application spaces. Reinforcement learning solutions tend to generalize poorly when exposed to new tasks outside of the data distribution they are trained on, prompting an interest in continual learning algorithms. In tandem with research on continual learning algorithms, there is a need for challenge environments, carefully designed experiments, and metrics to assess research progress. We address the latter need by introducing a framework for continual reinforcement-learning development and assessment using Lifelong Learning Explorer (L2Explorer), a new, Unity-based, first-person 3D exploration environment that can be continuously reconfigured to generate a range of tasks and task variants structured into complex and evolving evaluation curricula. In contrast to procedurally generated worlds with randomized components, we have developed a systematic approach to defining curricula in response to controlled changes with accompanying metrics to assess transfer, performance recovery, and data efficiency. Taken together, the L2Explorer environment and evaluation approach provides a framework for developing future evaluation methodologies in open-world settings and rigorously evaluating approaches to lifelong learning.