在SGD的统一界面及其动量变体上

论文标题

在SGD的统一界面及其动量变体上

On Uniform Boundedness Properties of SGD and its Momentum Variants

论文作者

Wang, Xiaoyu, Johansson, Mikael

论文摘要

随机梯度下降的理论性且有可能实用的问题是，轨迹可能逃到无穷大。在本说明中，我们研究了沿随机梯度下降算法及其重要动量变体的迭代元素和函数值的均匀界限。在损失函数的平滑度和$ r $ $ $ - 降解性下，我们表明，较宽的阶梯尺寸（包括广泛使用的踩踏和余弦）具有（或不使用）重新启动步骤尺寸，从而产生均匀界限的迭代和功能值。详细讨论了一些满足这些假设的重要应用，包括相位检索问题，高斯混合模型和一些神经网络分类器。我们进一步扩展了SGD的均匀界限及其在广义耗散性下的动量变体，其尾巴比二次函数慢的功能延伸。这包括一些有趣的应用程序，例如，使用$ \ ell_1 $正则化的贝叶斯逻辑回归和逻辑回归。

A theoretical, and potentially also practical, problem with stochastic gradient descent is that trajectories may escape to infinity. In this note, we investigate uniform boundedness properties of iterates and function values along the trajectories of the stochastic gradient descent algorithm and its important momentum variant. Under smoothness and $R$-dissipativity of the loss function, we show that broad families of step-sizes, including the widely used step-decay and cosine with (or without) restart step-sizes, result in uniformly bounded iterates and function values. Several important applications that satisfy these assumptions, including phase retrieval problems, Gaussian mixture models, and some neural network classifiers, are discussed in detail. We further extend the uniform boundedness of SGD and its momentum variant under the generalized dissipativity for the functions whose tails grow slower than quadratic functions. This includes some interesting applications, for example, Bayesian logistic regression and logistic regression with $\ell_1$ regularization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题