论文标题
随机操作与随机策略:基于自举模型的直接策略搜索
Random Actions vs Random Policies: Bootstrapping Model-Based Direct Policy Search
论文作者
论文摘要
本文研究了初始数据收集方法对随后学习动力学模型的影响。动态模型近似给定任务的真实过渡功能,以直接在模型上执行策略搜索,而不是在昂贵的实际系统上进行。这项研究旨在通过比较文献中两个不同的策略搜索框架中使用的初始化方法来确定如何尽可能高效地引导模型。该研究的重点是使用概率合奏的基于情节的进化方法框架下的模型性能。实验结果表明,各种任务依赖性因素可能对每种方法有害,这表明探索混合方法。
This paper studies the impact of the initial data gathering method on the subsequent learning of a dynamics model. Dynamics models approximate the true transition function of a given task, in order to perform policy search directly on the model rather than on the costly real system. This study aims to determine how to bootstrap a model as efficiently as possible, by comparing initialization methods employed in two different policy search frameworks in the literature. The study focuses on the model performance under the episode-based framework of Evolutionary methods using probabilistic ensembles. Experimental results show that various task-dependant factors can be detrimental to each method, suggesting to explore hybrid approaches.