错误补偿分布式SGD可以加速

论文标题

错误补偿分布式SGD可以加速

Error Compensated Distributed SGD Can Be Accelerated

论文作者

Qian, Xun, Richtárik, Peter, Zhang, Tong

论文摘要

梯度压缩是一种最近且日益流行的技术，可降低大规模机器学习模型的分布式培训中的通信成本。在这项工作中，我们着重于开发有效的分布式方法，这些方法可以适用于满足特定收缩属性的任何压缩机，其中包括无偏（经过适当的缩放）和偏见的压缩机（例如Randk和Topk）。梯度压缩天然应用会引入减慢收敛或导致发散的误差。旨在解决此问题的一种流行技术是错误补偿/错误反馈。由于与分析偏置压缩机相关的困难，尚不知道是否可以将带有错误补偿的梯度压缩与Nesterov的加速度结合使用。在这项工作中，我们首次展示了错误补偿的梯度压缩方法可以加速。特别是，我们提出并研究了误差无环的katyusha方法，并在标准假设下建立了加速的线性收敛速率。我们通过数值实验表明，所提出的方法收敛的通信回合要少于以前的错误补偿算法。

Gradient compression is a recent and increasingly popular technique for reducing the communication cost in distributed training of large-scale machine learning models. In this work we focus on developing efficient distributed methods that can work for any compressor satisfying a certain contraction property, which includes both unbiased (after appropriate scaling) and biased compressors such as RandK and TopK. Applied naively, gradient compression introduces errors that either slow down convergence or lead to divergence. A popular technique designed to tackle this issue is error compensation/error feedback. Due to the difficulties associated with analyzing biased compressors, it is not known whether gradient compression with error compensation can be combined with Nesterov's acceleration. In this work, we show for the first time that error compensated gradient compression methods can be accelerated. In particular, we propose and study the error compensated loopless Katyusha method, and establish an accelerated linear convergence rate under standard assumptions. We show through numerical experiments that the proposed method converges with substantially fewer communication rounds than previous error compensated algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题