用于多目标操纵任务的图形结构策略学习

论文标题

用于多目标操纵任务的图形结构策略学习

Graph-Structured Policy Learning for Multi-Goal Manipulation Tasks

论文作者

Klee, David, Biza, Ondrej, Platt, Robert

论文摘要

用于机器人操纵的多目标政策学习具有挑战性。先前的成功使用了对象的基于州的表示或提供了演示数据来促进学习。在本文中，通过对域的高级离散表示形式进行手工编码，我们表明，可以使用来自像素的Q学习来学习达到数十个目标的策略。代理商将学习重点放在更简单的本地政策上，这些策略是通过在抽象空间中进行计划来对其进行测序的。我们将我们的方法与标准的多目标RL基线以及在具有挑战性的块构造域上利用离散表示的其他方法进行了比较。我们发现我们的方法可以构建一百多个不同的块结构，并证明具有新物体的结构向前转移。最后，我们将学习的政策部署在真正的机器人上的模拟中。

Multi-goal policy learning for robotic manipulation is challenging. Prior successes have used state-based representations of the objects or provided demonstration data to facilitate learning. In this paper, by hand-coding a high-level discrete representation of the domain, we show that policies to reach dozens of goals can be learned with a single network using Q-learning from pixels. The agent focuses learning on simpler, local policies which are sequenced together by planning in the abstract space. We compare our method against standard multi-goal RL baselines, as well as other methods that leverage the discrete representation, on a challenging block construction domain. We find that our method can build more than a hundred different block structures, and demonstrate forward transfer to structures with novel objects. Lastly, we deploy the policy learned in simulation on a real robot.

下载PDF全文

下载文献需遵守相关版权规定

论文标题