通过软模块化多任务增强学习

论文标题

通过软模块化多任务增强学习

Multi-Task Reinforcement Learning with Soft Modularization

论文作者

Yang, Ruihan, Xu, Huazhe, Wu, Yi, Wang, Xiaolong

论文摘要

多任务学习是强化学习中一个非常具有挑战性的问题。虽然训练多个任务允许策略在不同任务上共享参数，但优化问题变得不平凡：尚不清楚应该在跨任务中重复使用网络中的哪些参数，以及来自不同任务的梯度如何相互干扰。因此，我们并没有天真地在任务之间共享参数，而是在策略表示方面引入了一种明确的模块化技术，以减轻此优化问题。给定基本策略网络，我们设计了一个路由网络，该网络估算了不同的路由策略，以重新配置每个任务的基本网络。我们的特定任务策略不是直接为每个任务选择路由，而是使用一种称为“软模块化”的方法来软结合所有可能的路由，这使其适合顺序任务。我们在仿真中尝试了各种机器人操纵任务，并显示我们的方法提高了强大基线的样本效率和性能，从而提高了幅度。

Multi-task learning is a very challenging problem in reinforcement learning. While training multiple tasks jointly allow the policies to share parameters across different tasks, the optimization problem becomes non-trivial: It remains unclear what parameters in the network should be reused across tasks, and how the gradients from different tasks may interfere with each other. Thus, instead of naively sharing parameters across tasks, we introduce an explicit modularization technique on policy representation to alleviate this optimization issue. Given a base policy network, we design a routing network which estimates different routing strategies to reconfigure the base network for each task. Instead of directly selecting routes for each task, our task-specific policy uses a method called soft modularization to softly combine all the possible routes, which makes it suitable for sequential tasks. We experiment with various robotics manipulation tasks in simulation and show our method improves both sample efficiency and performance over strong baselines by a large margin.

下载PDF全文

下载文献需遵守相关版权规定

论文标题