沟通高效的本地SGD，具有基于年龄的工人的选择

论文标题

沟通高效的本地SGD，具有基于年龄的工人的选择

Communication-Efficient Local SGD with Age-Based Worker Selection

论文作者

Zhu, Feng, Zhang, Jingjing, Wang, Xin

论文摘要

在参数服务器（PS）框架下，分布式学习的主要瓶颈是由于PS和工人之间经常进行双向传播而导致的通信成本。为了解决这个问题，通过分别减少每回合的沟通频率和参与工人的数量来利用局部随机梯度下降（SGD）和工人的选择。但是，部分参与可能对收敛率有害，尤其是对于本地数据集而言。在本文中，为了提高沟通效率并加快培训过程，我们制定了一种新型的工人选择策略，名为Agesel。年龄的关键推动因素是利用工人的年龄来平衡其参与频率。严格建立了本地SGD与拟议的基于年龄的部分工人参与的融合。仿真结果表明，提出的年龄策略可以显着减少实现目标准确性所需的训练回合数量以及交流成本。还探索了算法超参数的影响，以表现出基于年龄的工人选择的好处。

A major bottleneck of distributed learning under parameter-server (PS) framework is communication cost due to frequent bidirectional transmissions between the PS and workers. To address this issue, local stochastic gradient descent (SGD) and worker selection have been exploited by reducing the communication frequency and the number of participating workers at each round, respectively. However, partial participation can be detrimental to convergence rate, especially for heterogeneous local datasets. In this paper, to improve communication efficiency and speed up the training process, we develop a novel worker selection strategy named AgeSel. The key enabler of AgeSel is utilization of the ages of workers to balance their participation frequencies. The convergence of local SGD with the proposed age-based partial worker participation is rigorously established. Simulation results demonstrate that the proposed AgeSel strategy can significantly reduce the number of training rounds needed to achieve a targeted accuracy, as well as the communication cost. The influence of the algorithm hyper-parameter is also explored to manifest the benefit of age-based worker selection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题