Stackelberg POMDP：经济设计的加强学习方法

论文标题

Stackelberg POMDP：经济设计的加强学习方法

Stackelberg POMDP: A Reinforcement Learning Approach for Economic Design

论文作者

Brero, Gianluca, Eden, Alon, Chakrabarti, Darshan, Gerstgrasser, Matthias, Greenwald, Amy, Li, Vincent, Parkes, David C.

论文摘要

我们介绍了一个用于经济设计的增强学习框架，其中环境设计师与参与者之间的相互作用被建模为Stackelberg游戏。在这个游戏中，设计师（领导者）设定了经济体系的规则，而参与者（追随者）战略性做出了反应。我们将算法整合到将追随者的响应策略确定为领导者的学习环境中的算法，从而将领导者的学习问题作为POMDP提出，我们称为Stackelberg POMDP。我们证明，在Stackelberg游戏中，最佳领导者的策略是我们Stackelberg POMDP的最佳政策，这是有限的一组可能的策略，建立了解决POMDPS和Stackelberg Games之间的联系。我们通过分散执行框架的集中式培训在一套有限的政策选择下解决了POMDP。对于以无需重新学习的方式建模的追随者的特定情况，我们解决了越来越复杂的设置，包括间接机制设计的问题，在这些问题中，有旋转的机制设计和代理商的沟通有限。我们通过消融研究证明了训练框架的有效性。我们还为无纤维学习者提供了贝叶斯版本的粗相关平衡的收敛结果，将已知结果扩展到相关类型。

We introduce a reinforcement learning framework for economic design where the interaction between the environment designer and the participants is modeled as a Stackelberg game. In this game, the designer (leader) sets up the rules of the economic system, while the participants (followers) respond strategically. We integrate algorithms for determining followers' response strategies into the leader's learning environment, providing a formulation of the leader's learning problem as a POMDP that we call the Stackelberg POMDP. We prove that the optimal leader's strategy in the Stackelberg game is the optimal policy in our Stackelberg POMDP under a limited set of possible policies, establishing a connection between solving POMDPs and Stackelberg games. We solve our POMDP under a limited set of policy options via the centralized training with decentralized execution framework. For the specific case of followers that are modeled as no-regret learners, we solve an array of increasingly complex settings, including problems of indirect mechanism design where there is turn-taking and limited communication by agents. We demonstrate the effectiveness of our training framework through ablation studies. We also give convergence results for no-regret learners to a Bayesian version of a coarse-correlated equilibrium, extending known results to correlated types.

下载PDF全文

下载文献需遵守相关版权规定

论文标题