文艺复兴机器人：学习多样化技能的最佳运输政策融合

论文标题

文艺复兴机器人：学习多样化技能的最佳运输政策融合

Renaissance Robot: Optimal Transport Policy Fusion for Learning Diverse Skills

论文作者

Tan, Julia, Senanayake, Ransalu, Ramos, Fabio

论文摘要

深度强化学习（RL）是解决复杂机器人问题问题的有前途的方法。但是，尽管RL算法最近取得了进步，但通过反复互动进行学习的过程通常是非常耗时的。此外，RL的成功在很大程度上取决于奖励整形功能适合任务的程度，这也耗时为设计。随着针对各种机器人问题培训的代理商继续扩散，重复其对新领域的宝贵学习的能力变得越来越重要。在本文中，我们提出了一种利用最佳运输理论的政策融合后的事后技术，作为合并在不同情况下训练多种代理的知识的强大方法。我们进一步证明，这为学习新任务的神经网络政策提供了改进的权重初始化，比重新验证父母政策或从头开始培训新政策的时间和计算资源更少。最终，我们对Deep RL中常用的不同代理商的结果表明，专业知识可以统一为“文艺复兴时期的代理人”，从而可以更快地学习新技能。

Deep reinforcement learning (RL) is a promising approach to solving complex robotics problems. However, the process of learning through trial-and-error interactions is often highly time-consuming, despite recent advancements in RL algorithms. Additionally, the success of RL is critically dependent on how well the reward-shaping function suits the task, which is also time-consuming to design. As agents trained on a variety of robotics problems continue to proliferate, the ability to reuse their valuable learning for new domains becomes increasingly significant. In this paper, we propose a post-hoc technique for policy fusion using Optimal Transport theory as a robust means of consolidating the knowledge of multiple agents that have been trained on distinct scenarios. We further demonstrate that this provides an improved weights initialisation of the neural network policy for learning new tasks, requiring less time and computational resources than either retraining the parent policies or training a new policy from scratch. Ultimately, our results on diverse agents commonly used in deep RL show that specialised knowledge can be unified into a "Renaissance agent", allowing for quicker learning of new skills.

下载PDF全文

下载文献需遵守相关版权规定

论文标题