论文标题
LDSA:在合作多代理增强学习中学习动态子任务
LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent Reinforcement Learning
论文作者
论文摘要
近年来,合作的多机构增强学习(MARL)取得了显着进步。为了培训效率和可扩展性,大多数MARL算法使所有代理都具有相同的政策或价值网络。但是,在许多复杂的多机构任务中,期望不同的代理具有处理不同子任务的特定能力。在这些情况下,共享参数不加区别可能会导致所有代理的类似行为,这将限制勘探效率并降低最终性能。为了平衡训练的复杂性和代理行为的多样性,我们提出了一个新颖的框架,以学习合作MARL中的动态子任务(LDSA)。具体而言,我们首先引入一个子任务编码器,根据其身份为每个子任务构建一个向量表示。为了合理地将代理分配给不同的子任务,我们提出了一种基于能力的子任务选择策略,该策略可以将具有相似能力的代理动态分组为相同的子任务。通过这种方式,处理相同子任务的代理商分享了他们对特定能力的学习和不同的子任务对应于不同的特定能力。我们进一步介绍了两个正规机构,以增加子任务之间的表示差异,并分别通过劝阻代理商不经常更改子任务来稳定训练。经验结果表明,LDSA学习合理有效的子任务,以更好地协作,并显着提高了具有挑战性的Starcraft II微管理基准和Google Research Football的学习成绩。
Cooperative multi-agent reinforcement learning (MARL) has made prominent progress in recent years. For training efficiency and scalability, most of the MARL algorithms make all agents share the same policy or value network. However, in many complex multi-agent tasks, different agents are expected to possess specific abilities to handle different subtasks. In those scenarios, sharing parameters indiscriminately may lead to similar behavior across all agents, which will limit the exploration efficiency and degrade the final performance. To balance the training complexity and the diversity of agent behavior, we propose a novel framework to learn dynamic subtask assignment (LDSA) in cooperative MARL. Specifically, we first introduce a subtask encoder to construct a vector representation for each subtask according to its identity. To reasonably assign agents to different subtasks, we propose an ability-based subtask selection strategy, which can dynamically group agents with similar abilities into the same subtask. In this way, agents dealing with the same subtask share their learning of specific abilities and different subtasks correspond to different specific abilities. We further introduce two regularizers to increase the representation difference between subtasks and stabilize the training by discouraging agents from frequently changing subtasks, respectively. Empirical results show that LDSA learns reasonable and effective subtask assignment for better collaboration and significantly improves the learning performance on the challenging StarCraft II micromanagement benchmark and Google Research Football.