通过目标条件的离线增强学习的分层计划

论文标题

通过目标条件的离线增强学习的分层计划

Hierarchical Planning Through Goal-Conditioned Offline Reinforcement Learning

论文作者

Li, Jinning, Tang, Chen, Tomizuka, Masayoshi, Zhan, Wei

论文摘要

离线增强学习（RL）在探索风险和昂贵的机器人技术中的许多安全任务中都表现出了有效的作用。但是，它仍然很难获得时间扩展任务的技能。在本文中，我们研究了时间扩展任务的离线RL问题。我们提出了一个分层计划框架，该框架由低级目标的RL政策和高级目标计划者组成。低级政策通过离线RL培训。我们通过扰动的目标抽样过程来改善离线培训，以应对分布式目标。高级规划师通过采用基于模型的计划方法的优点选择中间子目标。它根据低级政策的学习价值函数计划将来的子目标序列。我们采用有条件的变异自动编码器来采样有意义的高维次目标候选者，并解决高级长期策略优化问题。我们评估了我们在长马驾驶和机器人导航任务中提出的方法。实验表明，我们的方法在这些复杂的任务中以不同的层次设计和其他常规规划师的表现优于基线。

Offline Reinforcement learning (RL) has shown potent in many safe-critical tasks in robotics where exploration is risky and expensive. However, it still struggles to acquire skills in temporally extended tasks. In this paper, we study the problem of offline RL for temporally extended tasks. We propose a hierarchical planning framework, consisting of a low-level goal-conditioned RL policy and a high-level goal planner. The low-level policy is trained via offline RL. We improve the offline training to deal with out-of-distribution goals by a perturbed goal sampling process. The high-level planner selects intermediate sub-goals by taking advantages of model-based planning methods. It plans over future sub-goal sequences based on the learned value function of the low-level policy. We adopt a Conditional Variational Autoencoder to sample meaningful high-dimensional sub-goal candidates and to solve the high-level long-term strategy optimization problem. We evaluate our proposed method in long-horizon driving and robot navigation tasks. Experiments show that our method outperforms baselines with different hierarchical designs and other regular planners without hierarchy in these complex tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题