通过加固学习清洁机器人的路径计划

论文标题

通过加固学习清洁机器人的路径计划

Path Planning of Cleaning Robot with Reinforcement Learning

论文作者

Moon, Woohyeon, Park, Bumgeun, Nengroo, Sarvar Hussain, Kim, Taeyoung, Har, Dongsoo

论文摘要

最近，随着对清洁机器人的需求稳步增加，因此家庭用电也在增加。为了解决这一电力消耗问题，有效的清洁机器人路径计划的问题变得很重要，并且已经进行了许多研究。但是，其中大多数是沿着简单的路径段移动，而不是关于清洁所有地方的整个路径。随着新兴的深度学习技术，已采用了加强学习（RL）来清洁机器人。但是，RL的模型仅在特定的清洁环境中运行，而不是各种清洁环境。问题在于，每当清洁环境变化时，模型都必须重新审查。为了解决此问题，近端策略优化（PPO）算法与有效的路径计划结合使用，该算法在各种清洁环境中运行，使用转移学习（TL），检测最近的清洁瓷砖，奖励形状，并制作精英设置方法。通过消融研究验证了所提出的方法，并与常规方法（例如随机和曲折）进行了比较。实验结果表明，所提出的方法可以提高训练性能，并提高原始PPO的收敛速度。它还表明，这种提出的方法比常规方法（随机，曲折）更好。

Recently, as the demand for cleaning robots has steadily increased, therefore household electricity consumption is also increasing. To solve this electricity consumption issue, the problem of efficient path planning for cleaning robot has become important and many studies have been conducted. However, most of them are about moving along a simple path segment, not about the whole path to clean all places. As the emerging deep learning technique, reinforcement learning (RL) has been adopted for cleaning robot. However, the models for RL operate only in a specific cleaning environment, not the various cleaning environment. The problem is that the models have to retrain whenever the cleaning environment changes. To solve this problem, the proximal policy optimization (PPO) algorithm is combined with an efficient path planning that operates in various cleaning environments, using transfer learning (TL), detection nearest cleaned tile, reward shaping, and making elite set methods. The proposed method is validated with an ablation study and comparison with conventional methods such as random and zigzag. The experimental results demonstrate that the proposed method achieves improved training performance and increased convergence speed over the original PPO. And it also demonstrates that this proposed method is better performance than conventional methods (random, zigzag).

下载PDF全文

下载文献需遵守相关版权规定

论文标题