论文标题

关于非全球局部最小值的无限实现功能在人工神经网络中的培训中具有许多重新激活的功能

On the existence of infinitely many realization functions of non-global local minima in the training of artificial neural networks with ReLU activation

论文作者

Ibragimov, Shokhrukh, Jentzen, Arnulf, Kröger, Timo, Riekert, Adrian

论文摘要

梯度下降(GD)类型优化方案是训练具有整流线性单元(RELU)激活的完全连接的前馈人工神经网络(ANN)的标准工具,可以被视为梯度流(GF)微分方程的溶液的时间离散。最近已经证明,每一个有界GF轨迹的风险都在一个隐藏层的ANN训练中收敛,并激活一个临界点的风险。考虑到这一点,这是GF轨迹和GD类型优化方案的数学收敛分析中的关键研究问题之一,可以研究风险功能关键点的足够和必要的条件,从而了解对问题参数(例如目标参数)的关键点的了解。在这项工作的第一个主要结果中,我们在对ANN进行了一个隐藏层和relu激活的培训中,每$ a,b \ in \ mathbb {r} $带有$ a <b $和每一个任意的$δ> 0 $,我们都有一个lipschitz的lipschitz contripshit target函数$ f \ colon f \ colon f \ colon f \ colon f \ colon $} $} $} $} $} $} $}在隐藏的层上,我们拥有的风险函数具有无数的非全球局部最低点的实现功能,其风险严格大于全球最低点的风险和任意大的$δ$的风险之和。在这项工作的第二个主要结果中,我们在特殊情况下以一个隐藏层和relu激活的ANN训练中显示了一个隐藏层上只有一个神经元的神经元,而目标函数是连续和分段多项式的,并且在最有限的许多不同的关键点实现功能下存在。

Gradient descent (GD) type optimization schemes are the standard instruments to train fully connected feedforward artificial neural networks (ANNs) with rectified linear unit (ReLU) activation and can be considered as temporal discretizations of solutions of gradient flow (GF) differential equations. It has recently been proved that the risk of every bounded GF trajectory converges in the training of ANNs with one hidden layer and ReLU activation to the risk of a critical point. Taking this into account it is one of the key research issues in the mathematical convergence analysis of GF trajectories and GD type optimization schemes, respectively, to study sufficient and necessary conditions for critical points of the risk function and, thereby, to obtain an understanding about the appearance of critical points in dependence of the problem parameters such as the target function. In the first main result of this work we prove in the training of ANNs with one hidden layer and ReLU activation that for every $ a, b \in \mathbb{R} $ with $ a < b $ and every arbitrarily large $ δ> 0 $ we have that there exists a Lipschitz continuous target function $ f \colon [a,b] \to \mathbb{R} $ such that for every number $ H > 1 $ of neurons on the hidden layer we have that the risk function has uncountably many different realization functions of non-global local minimum points whose risks are strictly larger than the sum of the risk of the global minimum points and the arbitrarily large $ δ$. In the second main result of this work we show in the training of ANNs with one hidden layer and ReLU activation in the special situation where there is only one neuron on the hidden layer and where the target function is continuous and piecewise polynomial that there exist at most finitely many different realization functions of critical points.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源