论文标题

多代理多臂匪徒的公平算法

Fair Algorithms for Multi-Agent Multi-Armed Bandits

论文作者

Hossain, Safwan, Micha, Evi, Shah, Nisarg

论文摘要

我们提出了经典多军匪徒问题的多代理变体,其中有$ n $ agents和$ k $臂,并且拉动手臂为每个代理商产生(可能不同的)随机奖励。与经典的多军强盗问题不同,目标是不学习“最好的手臂”。的确,每个代理商都可以感知另一只手臂是她个人最好的。取而代之的是,我们试图在武器上学习公平的分布。利用经济学和计算机科学领域的一长串研究,我们将NASH的社会福利作为我们的公平概念。我们设计了三种经典多臂强盗算法的多代理变体,并表明它们实现了额定性遗憾,现在以丢失的纳什社会福利来衡量。

We propose a multi-agent variant of the classical multi-armed bandit problem, in which there are $N$ agents and $K$ arms, and pulling an arm generates a (possibly different) stochastic reward for each agent. Unlike the classical multi-armed bandit problem, the goal is not to learn the "best arm"; indeed, each agent may perceive a different arm to be the best for her personally. Instead, we seek to learn a fair distribution over the arms. Drawing on a long line of research in economics and computer science, we use the Nash social welfare as our notion of fairness. We design multi-agent variants of three classic multi-armed bandit algorithms and show that they achieve sublinear regret, which is now measured in terms of the lost Nash social welfare.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源