高维的神经切线内核：三重下降和多尺度的泛化理论

论文标题

高维的神经切线内核：三重下降和多尺度的泛化理论

The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization

论文作者

Adlam, Ben, Pennington, Jeffrey

论文摘要

现代深度学习模型采用的参数比适合培训数据所需的参数要多得多。尽管常规的统计智慧表明，这种模型应该大大提高，但实际上，这些模型概括得很好。描述这种意外行为的新兴范式是在\ emph {double descent}曲线方面，其中增加模型的容量会导致其测试误差首先减小，然后在插值阈值附近增加到最大值，然后在超级参数方面再次减小。从理论上讲，最近的解释这种现象的努力集中在简单的设置上，例如线性回归或具有非结构化随机特征的内核回归，我们认为这太粗糙了，无法揭示实际神经网络的重要细微差别。我们通过神经切线内核在内核回归下提供了精确的高维渐近分析，该分析表征了通过梯度下降优化的广泛神经网络的行为。我们的结果表明，测试误差在过度参数化的制度中具有深度的非单调行为，甚至可以在参数数量随数据集大小四倍地缩放时表现出其他峰值和下降。

Modern deep learning models employ considerably more parameters than required to fit the training data. Whereas conventional statistical wisdom suggests such models should drastically overfit, in practice these models generalize remarkably well. An emerging paradigm for describing this unexpected behavior is in terms of a \emph{double descent} curve, in which increasing a model's capacity causes its test error to first decrease, then increase to a maximum near the interpolation threshold, and then decrease again in the overparameterized regime. Recent efforts to explain this phenomenon theoretically have focused on simple settings, such as linear regression or kernel regression with unstructured random features, which we argue are too coarse to reveal important nuances of actual neural networks. We provide a precise high-dimensional asymptotic analysis of generalization under kernel regression with the Neural Tangent Kernel, which characterizes the behavior of wide neural networks optimized with gradient descent. Our results reveal that the test error has non-monotonic behavior deep in the overparameterized regime and can even exhibit additional peaks and descents when the number of parameters scales quadratically with the dataset size.

下载PDF全文

下载文献需遵守相关版权规定

论文标题