论文标题
部分可观测时空混沌系统的无模型预测
Convergence of gradient descent for deep neural networks
论文作者
论文摘要
本文介绍了梯度下降到全球最小值的融合的标准,然后将其用于表明具有适当初始化的梯度下降时,当训练任何具有平稳且严格增加激活功能的前馈神经网络时,将其收敛到全球最小值,前提是输入维度大于或等于数据点的数量。先前工作的主要区别在于,网络的宽度可以是固定的数字,而不是随着数据点数的某些倍数或功率的增长。
This article presents a criterion for convergence of gradient descent to a global minimum, which is then used to show that gradient descent with proper initialization converges to a global minimum when training any feedforward neural network with smooth and strictly increasing activation functions, provided that the input dimension is greater than or equal to the number of data points. The main difference with prior work is that the width of the network can be a fixed number instead of growing as some multiple or power of the number of data points.