论文标题
梯度下降可证明逃脱了浅层网络训练的鞍点
Gradient descent provably escapes saddle points in the training of shallow ReLU networks
论文作者
论文摘要
动态系统理论最近已在优化中应用,以证明梯度下降算法绕过所谓的损失函数的严格鞍点。但是,在许多现代机器学习应用中,所需的规律性条件不满足。在本文中,我们证明了相关动力学系统结果的变体,即中心稳定的歧管定理,其中我们放宽了一些规律性要求。我们探讨了其与各种机器学习任务的相关性,特别关注具有标量输入的浅层整流线性单元(RELU)和泄漏的Relu网络。基于对浅层恢复和泄漏的Relu网络的临界点的详细检查,相对于仿射目标功能,我们表明梯度下降避免了大多数鞍点。此外,我们在有利的初始化条件下证明了与全球最小值的收敛,这是通过限制损失的明确阈值量化的。
Dynamical systems theory has recently been applied in optimization to prove that gradient descent algorithms bypass so-called strict saddle points of the loss function. However, in many modern machine learning applications, the required regularity conditions are not satisfied. In this paper, we prove a variant of the relevant dynamical systems result, a center-stable manifold theorem, in which we relax some of the regularity requirements. We explore its relevance for various machine learning tasks, with a particular focus on shallow rectified linear unit (ReLU) and leaky ReLU networks with scalar input. Building on a detailed examination of critical points of the square integral loss function for shallow ReLU and leaky ReLU networks relative to an affine target function, we show that gradient descent circumvents most saddle points. Furthermore, we prove convergence to global minima under favourable initialization conditions, quantified by an explicit threshold on the limiting loss.