论文标题
为什么深度学习概括
Why Deep Learning Generalizes
论文作者
论文摘要
鉴于其巨大的容量,使用梯度下降训练的非常大的深度学习模型对记忆具有很大的抵抗力,但同时也能够安装大型纯噪声数据集。在这里引入了方法,可以通过其中训练模型来记住通常被广泛化的数据集。我们发现,相对于概括,记忆很困难,但是增加噪声使记忆更容易。增加数据集的大小夸大了该数据集的特征:模型访问更多训练样本使得对于随机数据的过度拟合更容易,但对于自然图像而言更难。理论上探索了深度学习对概括的偏见,我们表明,概括是由于模型的参数被吸引到梯度下降期间该模型输入的最大稳定性点所致。
Very large deep learning models trained using gradient descent are remarkably resistant to memorization given their huge capacity, but are at the same time capable of fitting large datasets of pure noise. Here methods are introduced by which models may be trained to memorize datasets that normally are generalized. We find that memorization is difficult relative to generalization, but that adding noise makes memorization easier. Increasing the dataset size exaggerates the characteristics of that dataset: model access to more training samples makes overfitting easier for random data, but somewhat harder for natural images. The bias of deep learning towards generalization is explored theoretically, and we show that generalization results from a model's parameters being attracted to points of maximal stability with respect to that model's inputs during gradient descent.