通过奖励塑造，用于机器人导航的深度强化学习的概括

论文标题

通过奖励塑造，用于机器人导航的深度强化学习的概括

Generalization in Deep Reinforcement Learning for Robotic Navigation by Reward Shaping

论文作者

Miranda, Victor R. F., Neto, Armando A., Freitas, Gustavo M., Mozelli, Leonardo A.

论文摘要

在本文中，我们研究了DRL算法在本地导航问题的应用中的应用，其中机器人仅配备有限量范围的外部感受传感器（例如LIDAR），在未知和混乱的工作区中朝着目标位置移动。基于DRL的碰撞避免政策具有一些优势，但是一旦他们学习合适的动作的能力仅限于传感器范围，它们就非常容易受到本地最小值的影响。由于大多数机器人在非结构化环境中执行任务，因此寻求能够避免本地最小值的广义本地导航政策，尤其是在未经训练的情况下，这是非常兴趣的。为此，我们提出了一种新颖的奖励功能，该功能结合了在训练阶段获得的地图信息，从而提高了代理商故意最佳行动方案的能力。另外，我们使用SAC算法来训练我们的ANN，这表明在最先进的文献中比其他人更有效。一组SIM到SIM和SIM到现实的实验表明，我们提出的奖励与SAC相结合的表现优于比较局部最小值和避免碰撞的方法。

In this paper, we study the application of DRL algorithms in the context of local navigation problems, in which a robot moves towards a goal location in unknown and cluttered workspaces equipped only with limited-range exteroceptive sensors, such as LiDAR. Collision avoidance policies based on DRL present some advantages, but they are quite susceptible to local minima, once their capacity to learn suitable actions is limited to the sensor range. Since most robots perform tasks in unstructured environments, it is of great interest to seek generalized local navigation policies capable of avoiding local minima, especially in untrained scenarios. To do so, we propose a novel reward function that incorporates map information gained in the training stage, increasing the agent's capacity to deliberate about the best course of action. Also, we use the SAC algorithm for training our ANN, which shows to be more effective than others in the state-of-the-art literature. A set of sim-to-sim and sim-to-real experiments illustrate that our proposed reward combined with the SAC outperforms the compared methods in terms of local minima and collision avoidance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题