论文标题

一个有效的转移学习框架,用于多种强化学习

An Efficient Transfer Learning Framework for Multiagent Reinforcement Learning

论文作者

Yang, Tianpei, Wang, Weixun, Tang, Hongyao, Hao, Jianye, Meng, Zhaopeng, Mao, Hangyu, Li, Dong, Liu, Wulong, Zhang, Chengwei, Hu, Yujing, Chen, Yingfeng, Fan, Changjie

论文摘要

转移学习显示出巨大的潜力来提高单药加固学习(RL)效率。同样,如果代理可以彼此共享知识,也可以加速多重RL(MARL)。但是,这仍然是代理如何向其他代理商学习的问题。在本文中,我们提出了一个新颖的多种政策转移框架(MAPTF),以提高MARL效率。 MAPTF了解哪种代理的策略是每个代理商最重用的策略,何时通过将多种政策转移作为期权学习问题进行建模来终止它。此外,实际上,由于环境的部分可观察性,该选项模块只能收集所有代理的本地体验以进行更新。在这种情况下,每个代理商的经验可能彼此不一致,这可能会导致选项值估计的不准确和振荡。因此,我们提出了一种新颖的选项学习算法,即后继表示选项学习以通过将环境动态从奖励中解除并学习每个代理人偏爱的期权值来解决它。 MAPTF可以很容易地与现有的深入RL和MARL方法结合在一起,实验结果表明,在离散和连续状态空间中,它大大提高了现有方法的性能。

Transfer Learning has shown great potential to enhance single-agent Reinforcement Learning (RL) efficiency. Similarly, Multiagent RL (MARL) can also be accelerated if agents can share knowledge with each other. However, it remains a problem of how an agent should learn from other agents. In this paper, we propose a novel Multiagent Policy Transfer Framework (MAPTF) to improve MARL efficiency. MAPTF learns which agent's policy is the best to reuse for each agent and when to terminate it by modeling multiagent policy transfer as the option learning problem. Furthermore, in practice, the option module can only collect all agent's local experiences for update due to the partial observability of the environment. While in this setting, each agent's experience may be inconsistent with each other, which may cause the inaccuracy and oscillation of the option-value's estimation. Therefore, we propose a novel option learning algorithm, the successor representation option learning to solve it by decoupling the environment dynamics from rewards and learning the option-value under each agent's preference. MAPTF can be easily combined with existing deep RL and MARL approaches, and experimental results show it significantly boosts the performance of existing methods in both discrete and continuous state spaces.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源