从演示中学习模块化机器人运动

论文标题

从演示中学习模块化机器人运动

Learning Modular Robot Locomotion from Demonstrations

论文作者

Whitman, Julian, Choset, Howie

论文摘要

可以重新配置模块化机器人，以从一小部分组件中创建各种设计。但是，自身构建机器人的硬件还不够 - 每个机器人都需要一个控制器。一个人可以单独为某些设计创建控制器，但是制定用于其他设计的政策可能很耗时。这项工作提出了一种使用一组设计的演示的方法来加速策略学习以进行其他设计。我们利用一个学习框架，其中图神经网络由模块化组件组成，每个组件对应于一种类型的模块（例如，腿，车轮或身体），可以重新组合这些组件以一次从多个设计中学习。在本文中，我们开发了一种组合的增强和模仿学习算法。我们的方法是新颖的，因为该策略是优化的，既可以最大化一种设计的奖励，又要模仿一个目标函数中不同设计的演示。我们表明，当通过此组合目标优化模块化策略时，一组设计的演示会影响政策在不同设计上的表现，从而减少了所需的训练次数。

Modular robots can be reconfigured to create a variety of designs from a small set of components. But constructing a robot's hardware on its own is not enough -- each robot needs a controller. One could create controllers for some designs individually, but developing policies for additional designs can be time consuming. This work presents a method that uses demonstrations from one set of designs to accelerate policy learning for additional designs. We leverage a learning framework in which a graph neural network is made up of modular components, each component corresponds to a type of module (e.g., a leg, wheel, or body) and these components can be recombined to learn from multiple designs at once. In this paper we develop a combined reinforcement and imitation learning algorithm. Our method is novel because the policy is optimized to both maximize a reward for one design, and simultaneously imitate demonstrations from different designs, within one objective function. We show that when the modular policy is optimized with this combined objective, demonstrations from one set of designs influence how the policy behaves on a different design, decreasing the number of training iterations needed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题