宽两层神经网络的大量法律和中央限制定理：小批量和嘈杂的情况

论文标题

宽两层神经网络的大量法律和中央限制定理：小批量和嘈杂的情况

Law of large numbers and central limit theorem for wide two-layer neural networks: the mini-batch and noisy case

论文作者

Descours, Arnaud, Guillin, Arnaud, Michel, Manon, Nectoux, Boris

论文摘要

在这项工作中，我们考虑了一个宽阔的两层神经网络，并研究了其经验权重的行为，该动力学是由随机梯度下降沿着二次损失的动力学，并以迷你批次和噪声而进行。我们的目标是证明其进化的大量轨迹定律以及中心限制定理。当噪声缩放为1/n $β$和1/2 <$β$ $ \ le $ $ \ $ \ infty $时，我们严格地得出并概括了在[CRBVE20，MMM19，SS20B]中获得的LLN。当3/4 <$β$ $ \ le $ $ \ $ \ infty $时，我们还概括了CLT（另请参见[SS20A]），并进一步表现出迷你批次对导致波动的渐近方差的影响。 $β$ = 3/4的情况更棘手，我们举例说明差异时间的差异，从而在这种情况下确立了神经网络预测的不稳定性。通过简单的数值示例说明了这一点。

In this work, we consider a wide two-layer neural network and study the behavior of its empirical weights under a dynamics set by a stochastic gradient descent along the quadratic loss with mini-batches and noise. Our goal is to prove a trajectorial law of large number as well as a central limit theorem for their evolution. When the noise is scaling as 1/N $β$ and 1/2 < $β$ $\le$ $\infty$, we rigorously derive and generalize the LLN obtained for example in [CRBVE20, MMM19, SS20b]. When 3/4 < $β$ $\le$ $\infty$, we also generalize the CLT (see also [SS20a]) and further exhibit the effect of mini-batching on the asymptotic variance which leads the fluctuations. The case $β$ = 3/4 is trickier and we give an example showing the divergence with time of the variance thus establishing the instability of the predictions of the neural network in this case. It is illustrated by simple numerical examples.

下载PDF全文

下载文献需遵守相关版权规定

论文标题