论文标题
一项调查
Deep Model-Based Reinforcement Learning for High-Dimensional Problems, a Survey
论文作者
论文摘要
在过去的几年中,深厚的强化学习取得了巨大的成功。在游戏和机器人技术等任务中,已经解决了高度复杂的顺序决策问题。不幸的是,大多数深钢筋学习方法的样本复杂性很高,从而排除了它们在某些重要应用中的使用。基于模型的强化学习创建了环境动态的明确模型,以减少对环境样本的需求。当前的深度学习方法使用高容量网络来解决高维问题。不幸的是,高容量模型通常需要许多样本,从而否定了基于模型方法中样本复杂性较低的潜在益处。因此,基于深层模型的方法的挑战是实现高预测能力,同时保持较低的样本复杂性。近年来,已经引入了许多基于模型的方法来应对这一挑战。在本文中,我们调查了基于模型的景观。首先,我们讨论与其他领域的定义和关系。我们提出了一种基于三种方法的分类法:使用明确的计划,使用对学术过渡的明确规划以及对计划和过渡的端到端学习。我们使用这些方法来组织有关潜在模型等最新发展的全面概述。我们描述了方法和基准,并建议为每种方法的未来工作提供指示。有希望的研究方向包括课程学习,不确定性建模以及对转移学习的潜在模型的使用。
Deep reinforcement learning has shown remarkable success in the past few years. Highly complex sequential decision making problems have been solved in tasks such as game playing and robotics. Unfortunately, the sample complexity of most deep reinforcement learning methods is high, precluding their use in some important applications. Model-based reinforcement learning creates an explicit model of the environment dynamics to reduce the need for environment samples. Current deep learning methods use high-capacity networks to solve high-dimensional problems. Unfortunately, high-capacity models typically require many samples, negating the potential benefit of lower sample complexity in model-based methods. A challenge for deep model-based methods is therefore to achieve high predictive power while maintaining low sample complexity. In recent years, many model-based methods have been introduced to address this challenge. In this paper, we survey the contemporary model-based landscape. First we discuss definitions and relations to other fields. We propose a taxonomy based on three approaches: using explicit planning on given transitions, using explicit planning on learned transitions, and end-to-end learning of both planning and transitions. We use these approaches to organize a comprehensive overview of important recent developments such as latent models. We describe methods and benchmarks, and we suggest directions for future work for each of the approaches. Among promising research directions are curriculum learning, uncertainty modeling, and use of latent models for transfer learning.