论文标题
凸凸的匪徒学习非刻痕单调游戏
Bandit Learning in Convex Non-Strictly Monotone Games
论文作者
论文摘要
我们在收益信息设置下解决凸台游戏中的NASH Equilibria。我们认为游戏伪级是单调的情况,但不一定严格单调。严格的单调性的放松使学习算法应用于更大类的游戏,例如,零和游戏仅具有凸连接成本功能。我们得出了一种算法,在这种情况下,其迭代可证明会收敛到最小 - 纳什的平衡。 {从单个播放器的角度使用建议的算法,我们将游戏视为在线优化的实例}。通过此镜头,我们量化了算法的遗憾率,并提供了一种选择算法参数以最大程度地降低后悔率的方法。
We address learning Nash equilibria in convex games under the payoff information setting. We consider the case in which the game pseudo-gradient is monotone but not necessarily strictly monotone. This relaxation of strict monotonicity enables application of learning algorithms to a larger class of games, such as, for example, a zero-sum game with a merely convex-concave cost function. We derive an algorithm whose iterates provably converge to the least-norm Nash equilibrium in this setting. {From the perspective of a single player using the proposed algorithm, we view the game as an instance of online optimization}. Through this lens, we quantify the regret rate of the algorithm and provide an approach to choose the algorithm's parameters to minimize the regret rate.