在近似超国家的探索中，用于元加强学习

论文标题

在近似超国家的探索中，用于元加强学习

Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning

论文作者

Zintgraf, Luisa, Feng, Leo, Lu, Cong, Igl, Maximilian, Hartikainen, Kristian, Hofmann, Katja, Whiteson, Shimon

论文摘要

要快速学习一项新任务，对于代理商来说，有效探索通常是必不可少的 - 尤其是在第一次时间段的性能重要的情况下。学习这种行为的一种方法是通过元学习。但是，许多现有的方法依靠密集的奖励来进行元训练，如果奖励很少，可能会灾难性地失败。没有适当的奖励信号，元训练期间探索的需求会加剧。为了解决这个问题，我们提出了Hyperx，它使用新颖的奖励奖金来进行元训练在近似超国家空间（Hyper-STATE代表环境状态和代理商的任务信念）中进行探索。我们从经验上表明，与现有方法相比，Hyperx Meta-learns更好的任务探索和更成功地适应了新任务。

To rapidly learn a new task, it is often essential for agents to explore efficiently -- especially when performance matters from the first timestep. One way to learn such behaviour is via meta-learning. Many existing methods however rely on dense rewards for meta-training, and can fail catastrophically if the rewards are sparse. Without a suitable reward signal, the need for exploration during meta-training is exacerbated. To address this, we propose HyperX, which uses novel reward bonuses for meta-training to explore in approximate hyper-state space (where hyper-states represent the environment state and the agent's task belief). We show empirically that HyperX meta-learns better task-exploration and adapts more successfully to new tasks than existing methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题