对强化学习的新行动的概括

论文标题

对强化学习的新行动的概括

Generalization to New Actions in Reinforcement Learning

论文作者

Jain, Ayush, Szot, Andrew, Lim, Joseph J.

论文摘要

智力的基本特征是面对新环境的能力，例如从新的行动选择中做出决定。但是，标准加固学习采用固定的动作，在给出新的动作集时需要昂贵的再培训。为了使学习代理更适应能力，我们将零弹性概括的问题引入了新的动作。我们提出了一个两阶段的框架，代理首先将操作表示从与任务分开获取的行动信息中。然后，通过概括目标培训了一种灵活的动作集的策略。我们对顺序任务进行基准概括，例如从看不见的工具集中选择以解决物理推理难题和使用新颖的3D形状的堆叠塔。视频和代码可从https://sites.google.com/view/action-generalization获得

A fundamental trait of intelligence is the ability to achieve goals in the face of novel circumstances, such as making decisions from new action choices. However, standard reinforcement learning assumes a fixed set of actions and requires expensive retraining when given a new action set. To make learning agents more adaptable, we introduce the problem of zero-shot generalization to new actions. We propose a two-stage framework where the agent first infers action representations from action information acquired separately from the task. A policy flexible to varying action sets is then trained with generalization objectives. We benchmark generalization on sequential tasks, such as selecting from an unseen tool-set to solve physical reasoning puzzles and stacking towers with novel 3D shapes. Videos and code are available at https://sites.google.com/view/action-generalization

下载PDF全文

下载文献需遵守相关版权规定

论文标题