LASG：用于沟通效率分布式学习的懒惰聚集的随机梯度

论文标题

LASG：用于沟通效率分布式学习的懒惰聚集的随机梯度

LASG: Lazily Aggregated Stochastic Gradients for Communication-Efficient Distributed Learning

论文作者

Chen, Tianyi, Sun, Yuejiao, Yin, Wotao

论文摘要

本文针对解决分布式机器学习问题，例如以沟通效率的方式解决分布式机器学习问题。已经开发了一类新的随机梯度下降（SGD）方法，可以将其视为对最近开发的懒惰聚集的梯度（lag）方法的随机概括 - 证明了LASG名称的合理性。滞后适应地预测每轮交流的贡献，并仅选择要执行的重要沟通。它节省了通信，同时还可以保持收敛速度。但是，滞后仅与确定性梯度一起使用，并将其应用于随机梯度会产生较差的性能。 LASG的关键组件是针对随机梯度量身定制的一组新规则，可以实现以保存下载，上传或两者兼而有之。新算法在新鲜和陈旧的随机梯度之间进行自适应选择，并具有与原始SGD相当的收敛速率。 LASG实现了令人印象深刻的经验表现 - 通常可以通过数量级来节省整体通信。

This paper targets solving distributed machine learning problems such as federated learning in a communication-efficient fashion. A class of new stochastic gradient descent (SGD) approaches have been developed, which can be viewed as the stochastic generalization to the recently developed lazily aggregated gradient (LAG) method --- justifying the name LASG. LAG adaptively predicts the contribution of each round of communication and chooses only the significant ones to perform. It saves communication while also maintains the rate of convergence. However, LAG only works with deterministic gradients, and applying it to stochastic gradients yields poor performance. The key components of LASG are a set of new rules tailored for stochastic gradients that can be implemented either to save download, upload, or both. The new algorithms adaptively choose between fresh and stale stochastic gradients and have convergence rates comparable to the original SGD. LASG achieves impressive empirical performance --- it typically saves total communication by an order of magnitude.

下载PDF全文

下载文献需遵守相关版权规定

论文标题