论文标题
基于点云的强化学习,用于视觉导航中的SIM到现实和部分可观察性
Point Cloud Based Reinforcement Learning for Sim-to-Real and Partial Observability in Visual Navigation
论文作者
论文摘要
强化学习(RL)除其他基于学习的方法外,还代表了解决复杂的机器人任务(例如,驱动,操纵,导航等)的强大工具,需要实际数据来训练这些系统作为其最重要的局限性之一。模拟器的使用是解决此问题的一种方法,但是在模拟中获取的知识在现实世界中不直接起作用,这被称为SIM卡转移问题。尽管以前的作品着重于用作观测图的图像的性质(例如纹理和照明),这对SIM卡到SIMS转移很有用,但它们忽略了对所述观察结果的其他担忧,例如精确的几何含义,在机器人到机器人方面失败了,因此在SIM对真实的转移方面失败了。我们提出了一种方法,该方法可以学习通过点云和环境随机化构建的观察空间,在机器人和模拟器之间概括以实现SIMS到现实,同时还可以解决部分可观察性。我们证明了我们方法的好处在目标目标导航任务上,其中我们的方法被证明不受器机器人转移产生的未见场景,在机器人随机实验中胜过基于图像的基线,并且在SIM-To-Sim条件下呈现高性能。最后,我们执行了几项实验,以验证SIM到现实的家用机器人平台的传输,从而确认了我们系统的开箱即用性能。
Reinforcement Learning (RL), among other learning-based methods, represents powerful tools to solve complex robotic tasks (e.g., actuation, manipulation, navigation, etc.), with the need for real-world data to train these systems as one of its most important limitations. The use of simulators is one way to address this issue, yet knowledge acquired in simulations does not work directly in the real-world, which is known as the sim-to-real transfer problem. While previous works focus on the nature of the images used as observations (e.g., textures and lighting), which has proven useful for a sim-to-sim transfer, they neglect other concerns regarding said observations, such as precise geometrical meanings, failing at robot-to-robot, and thus in sim-to-real transfers. We propose a method that learns on an observation space constructed by point clouds and environment randomization, generalizing among robots and simulators to achieve sim-to-real, while also addressing partial observability. We demonstrate the benefits of our methodology on the point goal navigation task, in which our method proves to be highly unaffected to unseen scenarios produced by robot-to-robot transfer, outperforms image-based baselines in robot-randomized experiments, and presents high performances in sim-to-sim conditions. Finally, we perform several experiments to validate the sim-to-real transfer to a physical domestic robot platform, confirming the out-of-the-box performance of our system.