基于模型的视觉计划，具有自我监督的功能距离

论文标题

基于模型的视觉计划，具有自我监督的功能距离

Model-Based Visual Planning with Self-Supervised Functional Distances

论文作者

Tian, Stephen, Nair, Suraj, Ebert, Frederik, Dasari, Sudeep, Eysenbach, Benjamin, Finn, Chelsea, Levine, Sergey

论文摘要

通才机器人必须能够在其环境中完成各种任务。指定每个任务的一种吸引人的方法是根据目标观察。但是，通过加强学习的学习目标政策仍然是一个具有挑战性的问题，尤其是当没有手工设计的奖励功能时。学习的动态模型是一种无需奖励或指导数据的环境学习的有前途的方法，但是计划以这样的模型达到目标需要观察和目标状态之间的功能相似性概念。我们为基于模型的视觉目标到达提供了一种自我监督的方法，该方法使用视觉动力学模型以及使用无模型的强化学习来学习的动态距离函数。我们的方法完全使用离线，未标记的数据学习，从而使扩展到大型和多样化的数据集变得可行。在我们的实验中，我们发现我们的方法可以成功地学习在测试时间执行各种任务的模型，并通过模拟机器人臂在干扰器中移动对象，甚至学习使用现实世界中的机器人开放和关闭抽屉。在比较中，我们发现这种方法基本上优于基于模型和基于模型的先验方法。视频和可视化可在此处提供：http：//sites.google.com/berkeley.edu/mbold。

A generalist robot must be able to complete a variety of tasks in its environment. One appealing way to specify each task is in terms of a goal observation. However, learning goal-reaching policies with reinforcement learning remains a challenging problem, particularly when hand-engineered reward functions are not available. Learned dynamics models are a promising approach for learning about the environment without rewards or task-directed data, but planning to reach goals with such a model requires a notion of functional similarity between observations and goal states. We present a self-supervised method for model-based visual goal reaching, which uses both a visual dynamics model as well as a dynamical distance function learned using model-free reinforcement learning. Our approach learns entirely using offline, unlabeled data, making it practical to scale to large and diverse datasets. In our experiments, we find that our method can successfully learn models that perform a variety of tasks at test-time, moving objects amid distractors with a simulated robotic arm and even learning to open and close a drawer using a real-world robot. In comparisons, we find that this approach substantially outperforms both model-free and model-based prior methods. Videos and visualizations are available here: http://sites.google.com/berkeley.edu/mbold.

下载PDF全文

下载文献需遵守相关版权规定

论文标题