论文标题
对强化学习的新行动的概括
Generalization to New Actions in Reinforcement Learning
论文作者
论文摘要
智力的基本特征是面对新环境的能力,例如从新的行动选择中做出决定。但是,标准加固学习采用固定的动作,在给出新的动作集时需要昂贵的再培训。为了使学习代理更适应能力,我们将零弹性概括的问题引入了新的动作。我们提出了一个两阶段的框架,代理首先将操作表示从与任务分开获取的行动信息中。然后,通过概括目标培训了一种灵活的动作集的策略。我们对顺序任务进行基准概括,例如从看不见的工具集中选择以解决物理推理难题和使用新颖的3D形状的堆叠塔。视频和代码可从https://sites.google.com/view/action-generalization获得
A fundamental trait of intelligence is the ability to achieve goals in the face of novel circumstances, such as making decisions from new action choices. However, standard reinforcement learning assumes a fixed set of actions and requires expensive retraining when given a new action set. To make learning agents more adaptable, we introduce the problem of zero-shot generalization to new actions. We propose a two-stage framework where the agent first infers action representations from action information acquired separately from the task. A policy flexible to varying action sets is then trained with generalization objectives. We benchmark generalization on sequential tasks, such as selecting from an unseen tool-set to solve physical reasoning puzzles and stacking towers with novel 3D shapes. Videos and code are available at https://sites.google.com/view/action-generalization