离散的阶乘表示作为目标条件加强学习的抽象

论文标题

离散的阶乘表示作为目标条件加强学习的抽象

Discrete Factorial Representations as an Abstraction for Goal Conditioned Reinforcement Learning

论文作者

Islam, Riashat, Zang, Hongyu, Goyal, Anirudh, Lamb, Alex, Kawaguchi, Kenji, Li, Xin, Laroche, Romain, Bengio, Yoshua, Combes, Remi Tachet Des

论文摘要

目标条件加固学习（RL）是能够解决多个任务并达到各种目标的培训代理的有前途的方向。如何\ textIt {指定}和\ textit {地面}这些目标的方式使我们既可以在培训期间可靠地达到目标，又可以在评估期间推广到新目标。在嘈杂和高维感觉输入空间中定义目标对训练目标条件的代理人，甚至是对新目标的概括提出了挑战。我们建议通过学习目标的阶乘表示并通过离散化瓶颈来处理所得代表，通过我们称为DGRL的方法来处理所得的表示形式。我们表明，通过实验评估此方法，从迷宫环境到复杂的机器人导航和操作，使用实验性评估此方法，可以通过实验评估此方法来改善目标条件的RL设置中的性能。此外，我们证明了定理的降低分发目标的预期回报，同时仍允许指定具有表现力的组合结构的目标。

Goal-conditioned reinforcement learning (RL) is a promising direction for training agents that are capable of solving multiple tasks and reach a diverse set of objectives. How to \textit{specify} and \textit{ground} these goals in such a way that we can both reliably reach goals during training as well as generalize to new goals during evaluation remains an open area of research. Defining goals in the space of noisy and high-dimensional sensory inputs poses a challenge for training goal-conditioned agents, or even for generalization to novel goals. We propose to address this by learning factorial representations of goals and processing the resulting representation via a discretization bottleneck, for coarser goal specification, through an approach we call DGRL. We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups, by experimentally evaluating this method on tasks ranging from maze environments to complex robotic navigation and manipulation. Additionally, we prove a theorem lower-bounding the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive combinatorial structure.

下载PDF全文

下载文献需遵守相关版权规定

论文标题