论文标题
最大利润率在大幅度失败时起作用:概括没有均匀收敛性
Max-Margin Works while Large Margin Fails: Generalization without Uniform Convergence
论文作者
论文摘要
现代机器学习中的一个主要挑战是理论上了解过度参数化模型的概括属性。许多现有工具依赖于统一的融合(UC),该属性在拥有的属性中保证测试损失将接近培训损失,并在一类候选模型上均匀。 Nagarajan和Kolter(2019)表明,在某些简单的线性和神经网络设置中,任何统一的收敛绑定都将是空虚的,这留下了如何在UC失败的设置中证明概括的问题。我们的主要贡献是在两个这样的环境中证明了新的概括界限,一种线性和一种非线性。我们研究了Nagarajan和Kolter的线性分类设置,以及通过非线性政权中的两层神经网络学到的二次地面真实函数。我们证明了一种新型的边距结合,表明在特定信号到噪声阈值之上,在这两种设置中,任何接近最大的最严重分类器几乎都不会实现测试损失。我们的结果表明,接近最大利润很重要:虽然任何达到$(1-ε)$的模型的最大限制率良好,但达到最大密码一半的分类器可能会极大地失败。基于Nagarajan和Kolter的不可能结果,在稍强的假设下,我们表明,单方面的UC界限和经典保证金界限将在接近最大的最严密分类器上失败。我们的分析提供了有关为什么记忆可以与概括共存的洞察力:我们表明,在发生概括但UC失败的这种挑战性方案中,近乎最大的最细边缘分类器同时包含一些可概括的组件和一些可记住数据的过度拟合组件。过度拟合组件的存在足以排除UC,但是近超级余量保证存在足够的可推广组件。
A major challenge in modern machine learning is theoretically understanding the generalization properties of overparameterized models. Many existing tools rely on uniform convergence (UC), a property that, when it holds, guarantees that the test loss will be close to the training loss, uniformly over a class of candidate models. Nagarajan and Kolter (2019) show that in certain simple linear and neural-network settings, any uniform convergence bound will be vacuous, leaving open the question of how to prove generalization in settings where UC fails. Our main contribution is proving novel generalization bounds in two such settings, one linear, and one non-linear. We study the linear classification setting of Nagarajan and Kolter, and a quadratic ground truth function learned via a two-layer neural network in the non-linear regime. We prove a new type of margin bound showing that above a certain signal-to-noise threshold, any near-max-margin classifier will achieve almost no test loss in these two settings. Our results show that near-max-margin is important: while any model that achieves at least a $(1 - ε)$-fraction of the max-margin generalizes well, a classifier achieving half of the max-margin may fail terribly. Building on the impossibility results of Nagarajan and Kolter, under slightly stronger assumptions, we show that one-sided UC bounds and classical margin bounds will fail on near-max-margin classifiers. Our analysis provides insight on why memorization can coexist with generalization: we show that in this challenging regime where generalization occurs but UC fails, near-max-margin classifiers simultaneously contain some generalizable components and some overfitting components that memorize the data. The presence of the overfitting components is enough to preclude UC, but the near-extremal margin guarantees that sufficient generalizable components are present.