论文标题

部分可观测时空混沌系统的无模型预测

Using Contrastive Samples for Identifying and Leveraging Possible Causal Relationships in Reinforcement Learning

论文作者

Khadilkar, Harshad, Meisheri, Hardik

论文摘要

强化学习的一个重大挑战是量化行动与长期奖励之间的复杂关系。这种效果可能会以一系列状态对的序列表现出来,从而使它们难以查明。在本文中,我们提出了一种将过渡与状态显着偏差联系起来的方法,随后的奖励中有异常差异。此类过渡被标记为可能的因果效应,并将相应的状态行动对添加到单独的重型缓冲区中。此外,我们还包括\ textIt {对比}样本,这些样本与类似状态但有不同动作的过渡相对应。在培训期间,包括这种对比性经验重播(CER)可在2D导航任务上胜过基于标准价值的方法。我们认为,CER对于广泛的学习任务可能是有用的,包括任何非政策的增强算法。

A significant challenge in reinforcement learning is quantifying the complex relationship between actions and long-term rewards. The effects may manifest themselves over a long sequence of state-action pairs, making them hard to pinpoint. In this paper, we propose a method to link transitions with significant deviations in state with unusually large variations in subsequent rewards. Such transitions are marked as possible causal effects, and the corresponding state-action pairs are added to a separate replay buffer. In addition, we include \textit{contrastive} samples corresponding to transitions from a similar state but with differing actions. Including this Contrastive Experience Replay (CER) during training is shown to outperform standard value-based methods on 2D navigation tasks. We believe that CER can be useful for a broad class of learning tasks, including for any off-policy reinforcement learning algorithm.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源