在多游戏游戏中无需重新学习的紧张近期收敛率

论文标题

在多游戏游戏中无需重新学习的紧张近期收敛率

Tight last-iterate convergence rates for no-regret learning in multi-player games

论文作者

Golowich, Noah, Pattathil, Sarath, Daskalakis, Constantinos

论文摘要

我们研究了在多游戏游戏中获得无重格学习算法的最后近期收敛率的问题。我们表明，相对于平滑单调游戏中的间隙函数，具有恒定的阶梯尺寸的乐观梯度（OG）算法，它的稳定尺寸是$ O（1/\ sqrt {t}）$。该结果解决了Mertikopoulos＆Zhou（2018）的问题，他们询问是否可以应用额外的梯度方法（例如OG）来在多学院学习环境中获得改进的保证。我们上限的证明使用了一种新技术，该技术围绕着每次迭代的潜在功能的自适应选择。我们还表明，对于所有$ p $ -scli算法，$ o（1/\ sqrt {t}）$ rate均为紧密，其中包括OG作为一种特殊情况。作为我们下限分析的副产品，我们还提供了Arjevani等人的猜想的证明。（2015）比以前的方法更直接。

We study the question of obtaining last-iterate convergence rates for no-regret learning algorithms in multi-player games. We show that the optimistic gradient (OG) algorithm with a constant step-size, which is no-regret, achieves a last-iterate rate of $O(1/\sqrt{T})$ with respect to the gap function in smooth monotone games. This result addresses a question of Mertikopoulos & Zhou (2018), who asked whether extra-gradient approaches (such as OG) can be applied to achieve improved guarantees in the multi-agent learning setting. The proof of our upper bound uses a new technique centered around an adaptive choice of potential function at each iteration. We also show that the $O(1/\sqrt{T})$ rate is tight for all $p$-SCLI algorithms, which includes OG as a special case. As a byproduct of our lower bound analysis we additionally present a proof of a conjecture of Arjevani et al. (2015) which is more direct than previous approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题