由于反向传播的数值不稳定而导致的神经网络训练的局限性

论文标题

由于反向传播的数值不稳定而导致的神经网络训练的局限性

Limitations of neural network training due to numerical instability of backpropagation

论文作者

Karner, Clemens, Kazeev, Vladimir, Petersen, Philipp Christian

论文摘要

我们通过梯度下降研究深度神经网络的训练，其中浮点算术用于计算梯度。在此框架和现实假设下，我们证明，在梯度下降的培训过程中，找到可以维护的依赖神经网络非常不可能，超级线性地相对于它们的层数。在几乎所有产生近似高阶多项式速率的近似理论参数中，使用了与它们的层数相比，具有指数式仿射片的依赖神经网络的序列被指数级。结果，我们得出的结论是，实践中梯度下降引起的Relu神经网络序列的近似序列与理论上构造的序列有很大差异。将假设和理论结果与数值研究进行了比较，该研究得出了同意的结果。

We study the training of deep neural networks by gradient descent where floating-point arithmetic is used to compute the gradients. In this framework and under realistic assumptions, we demonstrate that it is highly unlikely to find ReLU neural networks that maintain, in the course of training with gradient descent, superlinearly many affine pieces with respect to their number of layers. In virtually all approximation theoretical arguments that yield high-order polynomial rates of approximation, sequences of ReLU neural networks with exponentially many affine pieces compared to their numbers of layers are used. As a consequence, we conclude that approximating sequences of ReLU neural networks resulting from gradient descent in practice differ substantially from theoretically constructed sequences. The assumptions and the theoretical results are compared to a numerical study, which yields concurring results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题