论文标题
基于模型的终身加固学习与贝叶斯探索
Model-based Lifelong Reinforcement Learning with Bayesian Exploration
论文作者
论文摘要
我们提出了一种基于模型的终身加强学习方法,该方法估计了分层贝叶斯后部蒸馏,将共享的共同结构跨越不同的任务。博学的后验与基于样本的贝叶斯探索程序相结合,提高了相关任务家族中学习的样本效率。我们首先得出了有限MDP设置中样品复杂性与后端的初始化质量之间关系的分析。下一步,我们通过引入一个可以与最近基于模型的深度RL方法结合的变异贝叶斯终身增强学习算法来扩展连续状态域的方法,并表现出向后传递。几个具有挑战性的领域的实验结果表明,我们的算法比最先进的终身RL方法获得了更好的前向和向后传递性能。
We propose a model-based lifelong reinforcement-learning approach that estimates a hierarchical Bayesian posterior distilling the common structure shared across different tasks. The learned posterior combined with a sample-based Bayesian exploration procedure increases the sample efficiency of learning across a family of related tasks. We first derive an analysis of the relationship between the sample complexity and the initialization quality of the posterior in the finite MDP setting. We next scale the approach to continuous-state domains by introducing a Variational Bayesian Lifelong Reinforcement Learning algorithm that can be combined with recent model-based deep RL methods, and that exhibits backward transfer. Experimental results on several challenging domains show that our algorithms achieve both better forward and backward transfer performance than state-of-the-art lifelong RL methods.