论文标题

NVIF:大规模合作多代理方案的相邻变分信息流

NVIF: Neighboring Variational Information Flow for Large-Scale Cooperative Multi-Agent Scenarios

论文作者

Chai, Jiajun, Zhu, Yuanheng, Zhao, Dongbin

论文摘要

基于沟通的多机构增强学习(MARL)提供了促进合作的代理之间的信息交换。但是,现有方法在大型多代理系统中的性能不能很好。在本文中,我们采用附近的通信,并提出一个相邻的变分信息流(NVIF),以为代理提供有效的通信。它采用各种自动编码器来将共享信息压缩到潜在状态。该通信协议不依赖于特定任务,因此可以预先训练以稳定MARL培训。除了。我们将NVIF与近端策略优化(NVIF-PPO)和DEAV Q网络(NVIF-DQN)相结合,并进行了理论分析以说明NVIF-PPO可以促进合作。我们通过两个具有不同MAP大小的任务评估了一种广泛使用的大规模多机构环境的Magent上的NVIF-PPO和NVIF-DQN。实验表明,我们的方法比其他方法优于其他方法,并且可以在大型多代理系统中学习有效且可扩展的合作策略。

Communication-based multi-agent reinforcement learning (MARL) provides information exchange between agents, which promotes the cooperation. However, existing methods cannot perform well in the large-scale multi-agent system. In this paper, we adopt neighboring communication and propose a Neighboring Variational Information Flow (NVIF) to provide efficient communication for agents. It employs variational auto-encoder to compress the shared information into a latent state. This communication protocol does not rely dependently on a specific task, so that it can be pre-trained to stabilize the MARL training. Besides. we combine NVIF with Proximal Policy Optimization (NVIF-PPO) and Deep Q Network (NVIF-DQN), and present a theoretical analysis to illustrate NVIF-PPO can promote cooperation. We evaluate the NVIF-PPO and NVIF-DQN on MAgent, a widely used large-scale multi-agent environment, by two tasks with different map sizes. Experiments show that our method outperforms other compared methods, and can learn effective and scalable cooperation strategies in the large-scale multi-agent system.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源