分散SGD的统一理论，随着拓扑和本地更新的变化

论文标题

分散SGD的统一理论，随着拓扑和本地更新的变化

A Unified Theory of Decentralized SGD with Changing Topology and Local Updates

论文作者

Koloskova, Anastasia, Loizou, Nicolas, Boreiri, Sadra, Jaggi, Martin, Stich, Sebastian U.

论文摘要

去中心化的随机优化方法最近引起了很多关注，这主要是因为它们的廉价按迭代成本，数据区域及其沟通效率。在本文中，我们介绍了统一的收敛分析，该分析涵盖了各种分散的SGD方法，到目前为止，该方法需要不同的直觉，具有不同的应用，并且在各个社区中已分别开发。我们的算法框架涵盖了本地SGD更新以及自适应网络拓扑的同步和成对八卦更新。我们得出了平滑（凸和非凸）问题的通用收敛速率，以及异质性（非相同分布数据）和IID-DATA设置之间插值的速率，例如在许多特殊情况下恢复线性收敛速率，例如对于过度参数化模型。我们的证据依赖于弱的假设（通常在几个方面改善先前的工作），并恢复（并改善）最重要的复杂性结果，例如许多重要场景，例如协调性SGD和联合平均（本地SGD）。

Decentralized stochastic optimization methods have gained a lot of attention recently, mainly because of their cheap per iteration cost, data locality, and their communication-efficiency. In this paper we introduce a unified convergence analysis that covers a large variety of decentralized SGD methods which so far have required different intuitions, have different applications, and which have been developed separately in various communities. Our algorithmic framework covers local SGD updates and synchronous and pairwise gossip updates on adaptive network topology. We derive universal convergence rates for smooth (convex and non-convex) problems and the rates interpolate between the heterogeneous (non-identically distributed data) and iid-data settings, recovering linear convergence rates in many special cases, for instance for over-parametrized models. Our proofs rely on weak assumptions (typically improving over prior work in several aspects) and recover (and improve) the best known complexity results for a host of important scenarios, such as for instance coorperative SGD and federated averaging (local SGD).

下载PDF全文

下载文献需遵守相关版权规定

论文标题