在连续时间随机优化中失去动量

论文标题

在连续时间随机优化中失去动量

Losing momentum in continuous-time stochastic optimisation

论文作者

Jin, Kexin, Latz, Jonas, Liu, Chenguang, Scagliotti, Alessandro

论文摘要

现代机器学习模型的培训通常包括解决受大规模数据约束的高维非凸优化问题。在这种情况下，基于动量的随机优化算法已变得特别广泛。随机性来自数据子采样，从而降低了计算成本。动量和随机性都有助于算法在全球范围内收敛。在这项工作中，我们提出并分析了动量的连续时间模型。该模型是一个分段确定的马尔可夫过程，它通过阻尼不足的动态系统和通过随机切换的数据子采样来代表优化器。我们研究了长期限制，子采样到无填充采样极限以及动量到非摩托车的限制。我们对随着时间的推移降低动量的情况特别感兴趣。在凸度的假设下，我们在降低动量随着时间的推移并使子采样率转移到无穷大时，显示了动力学系统与全局最小化器的收敛性。然后，我们提出了一个稳定的，符合性的离散方案，以从我们的连续时间动态系统中构造算法。在实验中，我们在凸面和非凸测试问题中研究了我们的方案。此外，我们在图像分类问题中训练卷积神经网络。与动量的随机梯度下降相比，我们的算法{genters}竞争结果。

The training of modern machine learning models often consists in solving high-dimensional non-convex optimisation problems that are subject to large-scale data. In this context, momentum-based stochastic optimisation algorithms have become particularly widespread. The stochasticity arises from data subsampling which reduces computational cost. Both, momentum and stochasticity help the algorithm to converge globally. In this work, we propose and analyse a continuous-time model for stochastic gradient descent with momentum. This model is a piecewise-deterministic Markov process that represents the optimiser by an underdamped dynamical system and the data subsampling through a stochastic switching. We investigate longtime limits, the subsampling-to-no-subsampling limit, and the momentum-to-no-momentum limit. We are particularly interested in the case of reducing the momentum over time. Under convexity assumptions, we show convergence of our dynamical system to the global minimiser when reducing momentum over time and letting the subsampling rate go to infinity. We then propose a stable, symplectic discretisation scheme to construct an algorithm from our continuous-time dynamical system. In experiments, we study our scheme in convex and non-convex test problems. Additionally, we train a convolutional neural network in an image classification problem. Our algorithm {attains} competitive results compared to stochastic gradient descent with momentum.

下载PDF全文

下载文献需遵守相关版权规定

论文标题