部分可观测时空混沌系统的无模型预测

论文标题

部分可观测时空混沌系统的无模型预测

Graph Neural Networks for Relational Inductive Bias in Vision-based Deep Reinforcement Learning of Robot Control

论文作者

Oliva, Marco, Banik, Soubarna, Josifovski, Josip, Knoll, Alois

论文摘要

最先进的强化学习算法主要从数值状态向量或图像中学习策略。两种方法通常都不会考虑到该任务的结构知识，这在机器人应用中尤其普遍，如果利用了这项任务，则可以使学习受益。这项工作介绍了一个神经网络体系结构，该架构结合了关系归纳偏见和视觉反馈，以学习有效的机器人操纵位置控制政策。我们得出一个图表表示，该图表对操纵器的物理结构进行建模，并将机器人的内部状态与图像编码网络生成的视觉场景的低维描述相结合。在此基础上，经过加固学习训练的图形神经网络预测了关节速度以控制机器人。我们进一步介绍了一种不对称的方法，可以使用监督学习分别与策略分别培训图像编码器。实验结果表明，对于在几何简单的2D环境中的2-DOF平面机器人中，视觉场景的熟悉表示可以取代对达到目标的显式坐标的访问，而不会损害策略的质量和样本效率。我们进一步显示了该模型在视觉上现实的3D环境中提高6多FOF机器人组的样品效率的能力。

State-of-the-art reinforcement learning algorithms predominantly learn a policy from either a numerical state vector or images. Both approaches generally do not take structural knowledge of the task into account, which is especially prevalent in robotic applications and can benefit learning if exploited. This work introduces a neural network architecture that combines relational inductive bias and visual feedback to learn an efficient position control policy for robotic manipulation. We derive a graph representation that models the physical structure of the manipulator and combines the robot's internal state with a low-dimensional description of the visual scene generated by an image encoding network. On this basis, a graph neural network trained with reinforcement learning predicts joint velocities to control the robot. We further introduce an asymmetric approach of training the image encoder separately from the policy using supervised learning. Experimental results demonstrate that, for a 2-DoF planar robot in a geometrically simplistic 2D environment, a learned representation of the visual scene can replace access to the explicit coordinates of the reaching target without compromising on the quality and sample efficiency of the policy. We further show the ability of the model to improve sample efficiency for a 6-DoF robot arm in a visually realistic 3D environment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题