论文标题

信贷分配的前瞻性和事后看来

Forethought and Hindsight in Credit Assignment

论文作者

Chelu, Veronica, Precup, Doina, van Hasselt, Hado

论文摘要

我们通过与世界内部模型进行计划以改善其预测的方式,解决有关代理最好地使用其他计算方式来传播新信息的基本问题中的信用分配问题。特别是,我们努力了解通过远期模型或以落后模型运行的事后进行的计划所采用的计划的收益和特殊性。我们在精心构造的场景中建立了两种计划机制的相对优点,局限性和互补特性。此外,我们研究了模型在计划中的最佳用途,主要集中在预测(重新)评估的状态的选择上。最后,我们讨论了模型估计的问题,并突出了一系列方法,这些方法从显式环境动态预测因子到更抽象的计划者感知模型。

We address the problem of credit assignment in reinforcement learning and explore fundamental questions regarding the way in which an agent can best use additional computation to propagate new information, by planning with internal models of the world to improve its predictions. Particularly, we work to understand the gains and peculiarities of planning employed as forethought via forward models or as hindsight operating with backward models. We establish the relative merits, limitations and complementary properties of both planning mechanisms in carefully constructed scenarios. Further, we investigate the best use of models in planning, primarily focusing on the selection of states in which predictions should be (re)-evaluated. Lastly, we discuss the issue of model estimation and highlight a spectrum of methods that stretch from explicit environment-dynamics predictors to more abstract planner-aware models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源