论文标题
多代理强化学习中协调的技能发现
Skill Discovery of Coordination in Multi-agent Reinforcement Learning
论文作者
论文摘要
无监督的技能发现驱使智能代理在没有特定特定任务奖励信号的情况下探索未知环境,而代理商会获得各种技能,这些技能可能在代理商适应新任务时可能很有用。在本文中,我们提出了“多代理技能发现”(MASD),这是一种发现多种代理协调模式技能的方法。所提出的方法旨在最大化代表技能的潜在代码Z与所有代理状态的组合之间的相互信息。同时,它通过对抗训练抑制了Z对任何单一代理的赋权。用另一个词来设置信息瓶颈,以避免授权退化。首先,我们在一般粒子多代理环境中显示了各种技能在协调水平上的出现。其次,我们透露,“瓶颈”可以阻止技能崩溃到单个代理商,并增强了学习技能的多样性。最后,我们表明,经过验证的政策在监督的RL任务上具有更好的性能。
Unsupervised skill discovery drives intelligent agents to explore the unknown environment without task-specific reward signal, and the agents acquire various skills which may be useful when the agents adapt to new tasks. In this paper, we propose "Multi-agent Skill Discovery"(MASD), a method for discovering skills for coordination patterns of multiple agents. The proposed method aims to maximize the mutual information between a latent code Z representing skills and the combination of the states of all agents. Meanwhile it suppresses the empowerment of Z on the state of any single agent by adversarial training. In another word, it sets an information bottleneck to avoid empowerment degeneracy. First we show the emergence of various skills on the level of coordination in a general particle multi-agent environment. Second, we reveal that the "bottleneck" prevents skills from collapsing to a single agent and enhances the diversity of learned skills. Finally, we show the pretrained policies have better performance on supervised RL tasks.