超越价值：用于测试基于计划的RL中推断的清单

论文标题

超越价值：用于测试基于计划的RL中推断的清单

Beyond Value: CHECKLIST for Testing Inferences in Planning-Based RL

论文作者

Lam, Kin-Ho, Tabatabai, Delyar, Irvine, Jed, Bertucci, Donald, Ruangrotsakun, Anita, Kahng, Minsuk, Fern, Alan

论文摘要

加强学习（RL）代理通常通过其预期值在测试方案的分布中进行评估。不幸的是，这种评估方法为超出测试分布以外的部署后概括提供了有限的证据。在本文中，我们通过将最新的清单测试方法从自然语言处理扩展到基于计划的RL来解决此限制。具体而言，我们考虑使用学习过渡模型和价值功能通过在线树搜索做出决策的RL代理。关键思想是通过清单方法来改善对未来绩效的评估，以探索和评估树木搜索过程中代理商的推论。该方法为用户提供了界面和一般查询规则机制，以识别潜在的推理缺陷并验证预期的推理不变。我们提出了一项涉及知识渊博的AI研究人员的用户研究，使用该方法来评估训练有素的代理商来玩复杂的实时策略游戏。结果表明，该方法有效地允许用户识别代理推理中以前未知的缺陷。此外，我们的分析提供了有关AI专家如何使用这种测试方法的见解，这可能有助于改善未来的实例。

Reinforcement learning (RL) agents are commonly evaluated via their expected value over a distribution of test scenarios. Unfortunately, this evaluation approach provides limited evidence for post-deployment generalization beyond the test distribution. In this paper, we address this limitation by extending the recent CheckList testing methodology from natural language processing to planning-based RL. Specifically, we consider testing RL agents that make decisions via online tree search using a learned transition model and value function. The key idea is to improve the assessment of future performance via a CheckList approach for exploring and assessing the agent's inferences during tree search. The approach provides the user with an interface and general query-rule mechanism for identifying potential inference flaws and validating expected inference invariances. We present a user study involving knowledgeable AI researchers using the approach to evaluate an agent trained to play a complex real-time strategy game. The results show the approach is effective in allowing users to identify previously-unknown flaws in the agent's reasoning. In addition, our analysis provides insight into how AI experts use this type of testing approach, which may help improve future instantiations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题