论文标题
您认为会发生什么?通过预期的结果解释代理行为
What Did You Think Would Happen? Explaining Agent Behaviour Through Intended Outcomes
论文作者
论文摘要
我们提出了一种基于预期结果的概念的增强学习解释的新颖形式。这些解释描述了代理商试图通过其行为实现的结果。我们提供了一个简单的证据,表明在传统的强化学习中,不可能在事后解释这种性质的一般方法。相反,必须与培训代理人一起收集解释所需的信息。我们得出了旨在根据几种Q功能近似变体的意图提取局部解释的方法,并证明了解释与所学的Q值之间的一致性。我们展示了有关多种强化学习问题的方法,并提供了代码,以帮助研究人员内省其RL环境和算法。
We present a novel form of explanation for Reinforcement Learning, based around the notion of intended outcome. These explanations describe the outcome an agent is trying to achieve by its actions. We provide a simple proof that general methods for post-hoc explanations of this nature are impossible in traditional reinforcement learning. Rather, the information needed for the explanations must be collected in conjunction with training the agent. We derive approaches designed to extract local explanations based on intention for several variants of Q-function approximation and prove consistency between the explanations and the Q-values learned. We demonstrate our method on multiple reinforcement learning problems, and provide code to help researchers introspecting their RL environments and algorithms.