论文标题
了解培训制度在持续学习中的作用
Understanding the Role of Training Regimes in Continual Learning
论文作者
论文摘要
灾难性遗忘会影响神经网络的训练,从而限制了他们依次学习多个任务的能力。从建立良好的可塑性困境的角度来看,神经网络往往过于塑料,缺乏防止忘记先前知识所必需的稳定性,这意味着随着学习的进展,网络往往会忘记先前看到的任务。这种现象在持续的学习文献中创造了,最近引起了很多关注,并且已经提出了几种成功的方法。但是,先前的工作有限地分析了不同的培训制度(学习率,批量规模,正则化方法)可能对遗忘的影响。在这项工作中,我们偏离了改变学习算法以提高稳定性的典型方法。取而代之的是,我们假设为每个任务发现的本地最小值的几何特性在整体遗忘程度中起着重要作用。特别是,我们研究了辍学,学习率衰减和批处理大小的影响,对形成训练制度,从而扩大了任务的本地最小值,从而帮助它不要灾难性地忘记。我们的研究提供了实用的见解,通过简单而有效的技术优于替代基线,以提高稳定性。
Catastrophic forgetting affects the training of neural networks, limiting their ability to learn multiple tasks sequentially. From the perspective of the well established plasticity-stability dilemma, neural networks tend to be overly plastic, lacking the stability necessary to prevent the forgetting of previous knowledge, which means that as learning progresses, networks tend to forget previously seen tasks. This phenomenon coined in the continual learning literature, has attracted much attention lately, and several families of approaches have been proposed with different degrees of success. However, there has been limited prior work extensively analyzing the impact that different training regimes -- learning rate, batch size, regularization method-- can have on forgetting. In this work, we depart from the typical approach of altering the learning algorithm to improve stability. Instead, we hypothesize that the geometrical properties of the local minima found for each task play an important role in the overall degree of forgetting. In particular, we study the effect of dropout, learning rate decay, and batch size, on forming training regimes that widen the tasks' local minima and consequently, on helping it not to forget catastrophically. Our study provides practical insights to improve stability via simple yet effective techniques that outperform alternative baselines.