学习Lipschitz的功能，通过GD训练的浅层过度参数化神经网络

论文标题

学习Lipschitz的功能，通过GD训练的浅层过度参数化神经网络

Learning Lipschitz Functions by GD-trained Shallow Overparameterized ReLU Neural Networks

论文作者

Kuzborskij, Ilja, Szepesvári, Csaba

论文摘要

我们探讨了过度参数浅的恢复神经网络学习Lipschitz时，通过梯度下降（GD）训练Lipschitz，具有添加噪声的非不同，有限的功能。为了避免在存在噪声的问题上，在此类训练中训练以接近为零的训练错误的神经网络不一致，我们专注于早期停滞的GD，这使我们能够显示一致性和最佳速率。特别是，我们从神经切线内核（NTK）的近似的角度探索了这个问题，该问题是GD训练有素的有限宽度神经网络。我们表明，每当保证某些早期停止规则在Relu激活函数引起的内核的Hilbert空间上给出最佳的速率（过量风险），可以使用相同的规则来实现最小值的最佳学习率，以通过神经网络在被考虑的LIPSCHITZ功能上学习。我们讨论了几种无数据和数据依赖于实际吸引力的停止规则，以产生最佳速率。

We explore the ability of overparameterized shallow ReLU neural networks to learn Lipschitz, nondifferentiable, bounded functions with additive noise when trained by Gradient Descent (GD). To avoid the problem that in the presence of noise, neural networks trained to nearly zero training error are inconsistent in this class, we focus on the early-stopped GD which allows us to show consistency and optimal rates. In particular, we explore this problem from the viewpoint of the Neural Tangent Kernel (NTK) approximation of a GD-trained finite-width neural network. We show that whenever some early stopping rule is guaranteed to give an optimal rate (of excess risk) on the Hilbert space of the kernel induced by the ReLU activation function, the same rule can be used to achieve minimax optimal rate for learning on the class of considered Lipschitz functions by neural networks. We discuss several data-free and data-dependent practically appealing stopping rules that yield optimal rates.

下载PDF全文

下载文献需遵守相关版权规定

论文标题