论文标题
延迟的多代理在线优化:异步性,适应性和乐观
Multi-Agent Online Optimization with Delays: Asynchronicity, Adaptivity, and Optimism
论文作者
论文摘要
在本文中,我们提供了一个一般框架,用于在存在延迟和异步性的情况下研究多学院在线学习问题。具体而言,我们建议和分析一类自适应双重平均方案,在这些方案中,代理只需要积累从整个系统中收到的梯度反馈,而无需任何代理协调。在单一机构的情况下,提出的方法的适应性使我们能够将一系列现有结果扩展到在播放动作和接收相应反馈之间潜在无限延迟的问题上。在多代理情况下,情况明显复杂得多,因为代理可能无法访问全球时钟来用作参考点。为了克服这一点,我们专注于可用于产生每个预测的信息,而不是与每个反馈相关的实际延迟。这使我们能够以最佳的遗憾界限得出自适应学习策略,即使在完全分散的异步环境中。最后,我们还分析了所提出的算法的“乐观”变体,该变体能够利用较慢的变化的问题可预测性,并导致改善的后悔界限。
In this paper, we provide a general framework for studying multi-agent online learning problems in the presence of delays and asynchronicities. Specifically, we propose and analyze a class of adaptive dual averaging schemes in which agents only need to accumulate gradient feedback received from the whole system, without requiring any between-agent coordination. In the single-agent case, the adaptivity of the proposed method allows us to extend a range of existing results to problems with potentially unbounded delays between playing an action and receiving the corresponding feedback. In the multi-agent case, the situation is significantly more complicated because agents may not have access to a global clock to use as a reference point; to overcome this, we focus on the information that is available for producing each prediction rather than the actual delay associated with each feedback. This allows us to derive adaptive learning strategies with optimal regret bounds, even in a fully decentralized, asynchronous environment. Finally, we also analyze an "optimistic" variant of the proposed algorithm which is capable of exploiting the predictability of problems with a slower variation and leads to improved regret bounds.