论文标题
端到端感觉运动学习的神经动态政策
Neural Dynamic Policies for End-to-End Sensorimotor Learning
论文作者
论文摘要
无论是模仿还是强化学习,感觉运动控制中的当前主导范式是直接在原始动作空间(例如扭矩,关节角度或最终效应器位置)中训练策略。这迫使代理在训练中的每个时间步中单独做出决策,因此将可伸缩性限制为连续,高维和长效任务。相比之下,很长一段时间以来,在古典机器人技术中进行的研究一直利用动态系统作为政策表示,以通过演示来学习机器人行为。但是,这些技术缺乏深度学习或强化学习提供的灵活性和概括性,并且在这种情况下仍未探索。在这项工作中,我们开始缩小这一差距,并通过通过二阶微分方程对动作空间进行重新聚光化的作用空间,将动态系统的结构嵌入到深层的基于神经网络的策略中。我们提出了神经动态策略(NDP),该政策在轨迹分布空间中进行预测,而不是先前的策略学习方法,而行动代表原始控制空间。嵌入式结构允许端到端的策略学习,以进行增强和模仿学习设置。我们表明,在几个机器人控制任务中,用于模仿和强化学习设置的效率或性能方面,NDP的表现优于先前的最先进。项目视频和代码可在https://shikharbahl.github.io/neural-dynamic-policies/
The current dominant paradigm in sensorimotor control, whether imitation or reinforcement learning, is to train policies directly in raw action spaces such as torque, joint angle, or end-effector position. This forces the agent to make decisions individually at each timestep in training, and hence, limits the scalability to continuous, high-dimensional, and long-horizon tasks. In contrast, research in classical robotics has, for a long time, exploited dynamical systems as a policy representation to learn robot behaviors via demonstrations. These techniques, however, lack the flexibility and generalizability provided by deep learning or reinforcement learning and have remained under-explored in such settings. In this work, we begin to close this gap and embed the structure of a dynamical system into deep neural network-based policies by reparameterizing action spaces via second-order differential equations. We propose Neural Dynamic Policies (NDPs) that make predictions in trajectory distribution space as opposed to prior policy learning methods where actions represent the raw control space. The embedded structure allows end-to-end policy learning for both reinforcement and imitation learning setups. We show that NDPs outperform the prior state-of-the-art in terms of either efficiency or performance across several robotic control tasks for both imitation and reinforcement learning setups. Project video and code are available at https://shikharbahl.github.io/neural-dynamic-policies/