论文标题
分析灾难性遗忘在过度参数方面的随机正交转换任务
Analysis of Catastrophic Forgetting for Random Orthogonal Transformation Tasks in the Overparameterized Regime
论文作者
论文摘要
已知过度参数化可以在神经网络中进行强大的概括性能。在这项工作中,我们对其对持续学习设置中灾难性遗忘的影响进行了初步的理论分析。我们在实验上表明,在列出的MNIST图像分类任务中,可以通过过度参数化来改善通过香草随机梯度下降训练的多层感知术的概括性能,并且通过过度参数实现的性能提高的程度与准论的持续学习算法相当。我们通过研究质量相似的两任任务线性回归问题来提供对这种效果的理论解释,其中每个任务都通过随机的正交转换相关。我们表明,当模型在没有任何其他正则化的情况下对两个任务进行训练时,如果模型过多地参数化,则第一个任务的风险增长很小。
Overparameterization is known to permit strong generalization performance in neural networks. In this work, we provide an initial theoretical analysis of its effect on catastrophic forgetting in a continual learning setup. We show experimentally that in permuted MNIST image classification tasks, the generalization performance of multilayer perceptrons trained by vanilla stochastic gradient descent can be improved by overparameterization, and the extent of the performance increase achieved by overparameterization is comparable to that of state-of-the-art continual learning algorithms. We provide a theoretical explanation of this effect by studying a qualitatively similar two-task linear regression problem, where each task is related by a random orthogonal transformation. We show that when a model is trained on the two tasks in sequence without any additional regularization, the risk gain on the first task is small if the model is sufficiently overparameterized.