连续控制中的深层本质上动机探索

论文标题

连续控制中的深层本质上动机探索

Deep Intrinsically Motivated Exploration in Continuous Control

论文作者

Saglam, Baturay, Kozat, Suleyman S.

论文摘要

在连续控制中，通常通过无方向的策略进行探索，在这种策略中，网络或所选动作的参数被随机噪声扰动。尽管已证明无方向性探索的深度设置可以提高派利方法的性能，但它们引入了过度的计算复杂性，并且已知在非政策环境中失败。本质上有动机的探索是无方向性策略的有效替代方法，但通常对它们进行离散的动作领域进行研究。在本文中，我们研究了如何有效地将内在动机与连续系统控制中的深入强化学习相结合，以获得有向的探索行为。我们将有关动物动机系统的现有理论调整为强化学习范式，并引入一种新颖且可扩展的定向探索策略。引入的方法是由价值函数错误的最大化，可以通过提取有用的信息并在单个探索目标下统一文献中的内在探索动机，从收集的一系列经验中受益。一系列广泛的经验研究表明，我们的框架扩展到更大，更多样化的状态空间，极大地改善了基准，并大大胜过无方向性的策略。

In continuous control, exploration is often performed through undirected strategies in which parameters of the networks or selected actions are perturbed by random noise. Although the deep setting of undirected exploration has been shown to improve the performance of on-policy methods, they introduce an excessive computational complexity and are known to fail in the off-policy setting. The intrinsically motivated exploration is an effective alternative to the undirected strategies, but they are usually studied for discrete action domains. In this paper, we investigate how intrinsic motivation can effectively be combined with deep reinforcement learning in the control of continuous systems to obtain a directed exploratory behavior. We adapt the existing theories on animal motivational systems into the reinforcement learning paradigm and introduce a novel and scalable directed exploration strategy. The introduced approach, motivated by the maximization of the value function's error, can benefit from a collected set of experiences by extracting useful information and unify the intrinsic exploration motivations in the literature under a single exploration objective. An extensive set of empirical studies demonstrate that our framework extends to larger and more diverse state spaces, dramatically improves the baselines, and outperforms the undirected strategies significantly.

下载PDF全文

下载文献需遵守相关版权规定

论文标题