降低方差的随机准牛顿方法用于分散学习：第一部分

论文标题

降低方差的随机准牛顿方法用于分散学习：第一部分

Variance-Reduced Stochastic Quasi-Newton Methods for Decentralized Learning: Part I

论文作者

Zhang, Jiaojiao, Liu, Huikang, So, Anthony Man-Cho, Ling, Qing

论文摘要

在这项工作中，我们研究了随机准牛顿的方法，以最大程度地减少分散网络上的成本函数的有限总和。在第一部分中，我们开发了一种通用算法框架，该框架结合了随机的准牛顿近似值，并降低了方差，以实现快速收敛。每个节点在每次构建一个局部的不应有的准牛顿方向，该方向渐近地接近全局，精确的一个方向。要具体，（i）通过使用动态平均共识来跟踪整个网络上差异降低差异局部随机梯度的平均值来构建局部梯度近似；（ii）假定局部的黑森反近似值是有界特征值的积极确定的，并且如何在第二部分中给出以满足这些假设以满足这些假设的方式。与现有的分散随机一阶方法相比，所提出的一般框架引入了二阶曲率信息，而不会产生额外的采样或通信。使用固定的步长，我们确定了建议的一般框架线性收敛到精确最佳解决方案的条件。

In this work, we investigate stochastic quasi-Newton methods for minimizing a finite sum of cost functions over a decentralized network. In Part I, we develop a general algorithmic framework that incorporates stochastic quasi-Newton approximations with variance reduction so as to achieve fast convergence. At each time each node constructs a local, inexact quasi-Newton direction that asymptotically approaches the global, exact one. To be specific, (i) A local gradient approximation is constructed by using dynamic average consensus to track the average of variance-reduced local stochastic gradients over the entire network; (ii) A local Hessian inverse approximation is assumed to be positive definite with bounded eigenvalues, and how to construct it to satisfy these assumptions will be given in Part II. Compared to the existing decentralized stochastic first-order methods, the proposed general framework introduces the second-order curvature information without incurring extra sampling or communication. With a fixed step size, we establish the conditions under which the proposed general framework linearly converges to an exact optimal solution.

下载PDF全文

下载文献需遵守相关版权规定

论文标题