一般可观察到的马尔可夫游戏的共享平衡的校准

论文标题

一般可观察到的马尔可夫游戏的共享平衡的校准

Calibration of Shared Equilibria in General Sum Partially Observable Markov Games

论文作者

Vadori, Nelson, Ganesh, Sumitra, Reddy, Prashant, Veloso, Manuela

论文摘要

培训多代理系统（MAS）实现现实的平衡为我们提供了一种有用的工具来理解和建模现实世界系统。我们考虑一个一般总和可观察到的马尔可夫游戏，其中不同类型的代理共享一个以特定于代理特定信息为条件的单一策略网络。本文的目的是i）正式理解这种药物的平衡，ii）将这种均衡的现象匹配到现实世界目标。使用分散执行的参数共享是使用单个策略网络训练多个代理的有效方法。但是，尚未研究这种代理商达到的平衡的性质：我们将共享平衡的新颖概念作为对称的纯纳什均衡型（FFG）的对称纯nash平衡（FFG），并证明使用自我玩法的某些类别的游戏与后者融合。此外，重要的是，这种平衡要满足某些约束，以便将MAS校准为现实世界中的数据进行实际用途：我们通过引入基于双重强化学习的新型方法来解决此问题，该方法基于双重强化学习的方法，可在共享的平衡目标中拟合代理的新兴行为，以将我们的方法应用于外部指定的方法，并将我们的方法应用于N-Plaplayer Markets。我们这样做是通过校准管理代理类型的分布而不是单个代理的参数，这允许代理之间的行为差异和共享策略网络的相干缩放与多个代理的相干缩放。

Training multi-agent systems (MAS) to achieve realistic equilibria gives us a useful tool to understand and model real-world systems. We consider a general sum partially observable Markov game where agents of different types share a single policy network, conditioned on agent-specific information. This paper aims at i) formally understanding equilibria reached by such agents, and ii) matching emergent phenomena of such equilibria to real-world targets. Parameter sharing with decentralized execution has been introduced as an efficient way to train multiple agents using a single policy network. However, the nature of resulting equilibria reached by such agents has not been yet studied: we introduce the novel concept of Shared equilibrium as a symmetric pure Nash equilibrium of a certain Functional Form Game (FFG) and prove convergence to the latter for a certain class of games using self-play. In addition, it is important that such equilibria satisfy certain constraints so that MAS are calibrated to real world data for practical use: we solve this problem by introducing a novel dual-Reinforcement Learning based approach that fits emergent behaviors of agents in a Shared equilibrium to externally-specified targets, and apply our methods to a n-player market example. We do so by calibrating parameters governing distributions of agent types rather than individual agents, which allows both behavior differentiation among agents and coherent scaling of the shared policy network to multiple agents.

下载PDF全文

下载文献需遵守相关版权规定

论文标题