自适应采样分布式随机方差降低了异质分布数据集的梯度

论文标题

自适应采样分布式随机方差降低了异质分布数据集的梯度

Adaptive Sampling Distributed Stochastic Variance Reduced Gradient for Heterogeneous Distributed Datasets

论文作者

Ramazanli, Ilqar, Nguyen, Han, Pham, Hai, Reddi, Sashank J., Poczos, Barnabas

论文摘要

我们研究了分布式优化算法，以最大程度地降低分布在几台机器上的\ emph {异质}函数的平均值，重点是通信效率。在这种情况下，天真地使用经典随机梯度下降（SGD）或其变体（例如SVRG），其机器均匀采样通常会产生较差的性能。它通常会导致收敛速率对整个设备梯度的最大Lipschitz常数的依赖性。在本文中，我们提出了一种专门迎合这些设置的机器的新颖\ emph {自适应}采样。我们的方法依赖于本地Lipschitz常数的自适应估计，基于过去梯度的信息。我们表明，新的方式将收敛速率的依赖性从最大Lipschitz常数到\ emph {平均} Lipschitz在整个机器之间常数，从而显着加速了收敛性。我们的实验表明，我们的方法确实加快了在异质环境中标准SVRG算法的收敛性。

We study distributed optimization algorithms for minimizing the average of \emph{heterogeneous} functions distributed across several machines with a focus on communication efficiency. In such settings, naively using the classical stochastic gradient descent (SGD) or its variants (e.g., SVRG) with a uniform sampling of machines typically yields poor performance. It often leads to the dependence of convergence rate on maximum Lipschitz constant of gradients across the devices. In this paper, we propose a novel \emph{adaptive} sampling of machines specially catered to these settings. Our method relies on an adaptive estimate of local Lipschitz constants base on the information of past gradients. We show that the new way improves the dependence of convergence rate from maximum Lipschitz constant to \emph{average} Lipschitz constant across machines, thereby, significantly accelerating the convergence. Our experiments demonstrate that our method indeed speeds up the convergence of the standard SVRG algorithm in heterogeneous environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题