神经蒸馏作为加强学习中的国家代表性瓶颈

论文标题

神经蒸馏作为加强学习中的国家代表性瓶颈

Neural Distillation as a State Representation Bottleneck in Reinforcement Learning

论文作者

Guillet, Valentin, Wilson, Dennis G., Aguilar-Melchor, Carlos, Rachelson, Emmanuel

论文摘要

在处理强化学习中的多个任务时，学习良好的状态表示是一项关键技能，因为它可以在任务之间进行转移和更好的概括。但是，定义构成有用的表示远非简单，到目前为止，还没有找到这种编码的标准方法。在本文中，我们认为蒸馏（旨在模仿具有单个神经网络的一组给定策略的过程）可用于学习显示有利特征的状态表示。在这方面，我们定义了三个标准，这些标准衡量了状态编码的理想特征：在输入空间中选择重要变量的能力，根据其相应的最佳动作有效分离状态的能力以及在新任务上编码的状态的鲁棒性。我们首先评估这些标准，并根据标准倒置的摆问题验证蒸馏对玩具环境的贡献，然后再扩展我们对Atari和Procgen基准的更复杂的视觉任务的分析。

Learning a good state representation is a critical skill when dealing with multiple tasks in Reinforcement Learning as it allows for transfer and better generalization between tasks. However, defining what constitute a useful representation is far from simple and there is so far no standard method to find such an encoding. In this paper, we argue that distillation -- a process that aims at imitating a set of given policies with a single neural network -- can be used to learn a state representation displaying favorable characteristics. In this regard, we define three criteria that measure desirable features of a state encoding: the ability to select important variables in the input space, the ability to efficiently separate states according to their corresponding optimal action, and the robustness of the state encoding on new tasks. We first evaluate these criteria and verify the contribution of distillation on state representation on a toy environment based on the standard inverted pendulum problem, before extending our analysis on more complex visual tasks from the Atari and Procgen benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题