NVIF：大规模合作多代理方案的相邻变分信息流

论文标题

NVIF：大规模合作多代理方案的相邻变分信息流

NVIF: Neighboring Variational Information Flow for Large-Scale Cooperative Multi-Agent Scenarios

论文作者

Chai, Jiajun, Zhu, Yuanheng, Zhao, Dongbin

论文摘要

基于沟通的多机构增强学习（MARL）提供了促进合作的代理之间的信息交换。但是，现有方法在大型多代理系统中的性能不能很好。在本文中，我们采用附近的通信，并提出一个相邻的变分信息流（NVIF），以为代理提供有效的通信。它采用各种自动编码器来将共享信息压缩到潜在状态。该通信协议不依赖于特定任务，因此可以预先训练以稳定MARL培训。除了。我们将NVIF与近端策略优化（NVIF-PPO）和DEAV Q网络（NVIF-DQN）相结合，并进行了理论分析以说明NVIF-PPO可以促进合作。我们通过两个具有不同MAP大小的任务评估了一种广泛使用的大规模多机构环境的Magent上的NVIF-PPO和NVIF-DQN。实验表明，我们的方法比其他方法优于其他方法，并且可以在大型多代理系统中学习有效且可扩展的合作策略。

Communication-based multi-agent reinforcement learning (MARL) provides information exchange between agents, which promotes the cooperation. However, existing methods cannot perform well in the large-scale multi-agent system. In this paper, we adopt neighboring communication and propose a Neighboring Variational Information Flow (NVIF) to provide efficient communication for agents. It employs variational auto-encoder to compress the shared information into a latent state. This communication protocol does not rely dependently on a specific task, so that it can be pre-trained to stabilize the MARL training. Besides. we combine NVIF with Proximal Policy Optimization (NVIF-PPO) and Deep Q Network (NVIF-DQN), and present a theoretical analysis to illustrate NVIF-PPO can promote cooperation. We evaluate the NVIF-PPO and NVIF-DQN on MAgent, a widely used large-scale multi-agent environment, by two tasks with different map sizes. Experiments show that our method outperforms other compared methods, and can learn effective and scalable cooperation strategies in the large-scale multi-agent system.

下载PDF全文

下载文献需遵守相关版权规定

论文标题