神经协方差SDE：初始化时形状的无限深度和宽度网络

论文标题

神经协方差SDE：初始化时形状的无限深度和宽度网络

The Neural Covariance SDE: Shaped Infinite Depth-and-Width Networks at Initialization

论文作者

Li, Mufan Bill, Nica, Mihai, Roy, Daniel M.

论文摘要

鉴于倒数第二层定义的随机协方差矩阵，初始化时前馈神经网络的逻辑输出是有条件的高斯。在这项工作中，我们研究了此随机矩阵的分布。最近的工作表明，将激活函数塑造为网络深度较大，对于此协方差矩阵来说，必须是非分类的。但是，当前对这种塑形方法的无限宽度风格的理解对于大深度不令人满意：无限宽度分析忽略了从一层到一层的微观波动，但是这些波动在许多层上积累。为了克服这一缺点，我们研究了形状的无限深度和宽度极限中的随机协方差矩阵。我们确定到达非平凡限制所需的激活函数的精确缩放函数，并表明随机协方差矩阵受我们称为神经协方差SDE的随机微分方程（SDE）的控制。使用模拟，我们表明SDE与有限网络的随机协方差矩阵的分布非常匹配。此外，我们恢复了一个基于激活函数的大型网络爆炸和消失的规范的唯一条件。

The logit outputs of a feedforward neural network at initialization are conditionally Gaussian, given a random covariance matrix defined by the penultimate layer. In this work, we study the distribution of this random matrix. Recent work has shown that shaping the activation function as network depth grows large is necessary for this covariance matrix to be non-degenerate. However, the current infinite-width-style understanding of this shaping method is unsatisfactory for large depth: infinite-width analyses ignore the microscopic fluctuations from layer to layer, but these fluctuations accumulate over many layers. To overcome this shortcoming, we study the random covariance matrix in the shaped infinite-depth-and-width limit. We identify the precise scaling of the activation function necessary to arrive at a non-trivial limit, and show that the random covariance matrix is governed by a stochastic differential equation (SDE) that we call the Neural Covariance SDE. Using simulations, we show that the SDE closely matches the distribution of the random covariance matrix of finite networks. Additionally, we recover an if-and-only-if condition for exploding and vanishing norms of large shaped networks based on the activation function.

下载PDF全文

下载文献需遵守相关版权规定

论文标题