用单神经元层对深线性网络的全球收敛分析

论文标题

用单神经元层对深线性网络的全球收敛分析

Global Convergence Analysis of Deep Linear Networks with A One-neuron Layer

论文作者

Chen, Kun, Lin, Dachao, Zhang, Zhihua

论文摘要

在本文中，我们遵循Eftekhari的工作，对深度线性网络进行非本地收敛分析。具体而言，我们考虑优化具有二次损失下具有一个神经元的层的深线性网络。我们描述了梯度流下具有任意起点的轨迹的收敛点，包括收敛到鞍点之一或原始点之一的路径。我们还显示了特定的轨迹收敛速率，这些轨迹通过阶段收敛到全球最小化器。为了实现这些结果，本文主要扩展了Eftekhari的工作中的机械，以确定排名稳定的集合和全球最小化合物融合集。我们还举例说明了我们定义的必要性。据我们所知，至关重要的是，我们的结果似乎是第一个从任意初始化点对线性神经网络进行非本地全局分析的结果，而不是统治神经网络文献的懒惰训练制度，并在Eftekhari的工作中限制了良性初始化。我们还注意到，将结果扩展到没有隐藏神经元假设的一般线性网络仍然是一个具有挑战性的开放问题。

In this paper, we follow Eftekhari's work to give a non-local convergence analysis of deep linear networks. Specifically, we consider optimizing deep linear networks which have a layer with one neuron under quadratic loss. We describe the convergent point of trajectories with arbitrary starting point under gradient flow, including the paths which converge to one of the saddle points or the original point. We also show specific convergence rates of trajectories that converge to the global minimizer by stages. To achieve these results, this paper mainly extends the machinery in Eftekhari's work to provably identify the rank-stable set and the global minimizer convergent set. We also give specific examples to show the necessity of our definitions. Crucially, as far as we know, our results appear to be the first to give a non-local global analysis of linear neural networks from arbitrary initialized points, rather than the lazy training regime which has dominated the literature of neural networks, and restricted benign initialization in Eftekhari's work. We also note that extending our results to general linear networks without one hidden neuron assumption remains a challenging open problem.

下载PDF全文

下载文献需遵守相关版权规定

论文标题