论文标题
重新平衡基于典范的课堂学习学习
Rebalancing Batch Normalization for Exemplar-based Class-Incremental Learning
论文作者
论文摘要
在各种计算机视觉任务中,已对神经网进行了广泛的研究,批次归一化(BN)及其变体已被广泛研究,但是相对较少的工作致力于研究BN在持续学习中的效果。为此,我们为BN开发了一个新的更新补丁,尤其是针对基于典范的课堂学习(CIL)量身定制的。 CIL中BN的主要问题是在迷你批次中当前和过去任务之间训练数据的不平衡,这使经验平均值和差异以及BN的可学习仿射转换参数巨大偏向于当前任务 - 有助于忘记过去的任务。尽管最近为“在线” CIL开发了最近的BN变体,其中训练是通过单个时期进行的,但我们表明他们的方法不一定带来“离线” CIL的收益,其中模型在不平衡的训练数据上接受了多个时期的培训。其方法无效的主要原因在于没有完全解决数据不平衡问题,尤其是在计算学习BN的仿射转换参数的梯度方面。因此,我们提出了我们新的新无参数变体,称为任务均衡的BN(TBBN),通过在训练过程中使用RESHAPE和重复操作进行水平结合的任务合理批处理,以更正确地解决不平衡问题。根据我们对CIFAR-100,Imagenet-100和五个不同的任务数据集的类增量学习实验,我们证明,我们的TBBN与推理时间中的Vanilla BN完全相同,很容易适用于大多数现有的基于类似的离线CIL Algorithms,并且始终是基于CIL Algorithms,并且始终是Bn variants bnvariants。
Batch Normalization (BN) and its variants has been extensively studied for neural nets in various computer vision tasks, but relatively little work has been dedicated to studying the effect of BN in continual learning. To that end, we develop a new update patch for BN, particularly tailored for the exemplar-based class-incremental learning (CIL). The main issue of BN in CIL is the imbalance of training data between current and past tasks in a mini-batch, which makes the empirical mean and variance as well as the learnable affine transformation parameters of BN heavily biased toward the current task -- contributing to the forgetting of past tasks. While one of the recent BN variants has been developed for "online" CIL, in which the training is done with a single epoch, we show that their method does not necessarily bring gains for "offline" CIL, in which a model is trained with multiple epochs on the imbalanced training data. The main reason for the ineffectiveness of their method lies in not fully addressing the data imbalance issue, especially in computing the gradients for learning the affine transformation parameters of BN. Accordingly, our new hyperparameter-free variant, dubbed as Task-Balanced BN (TBBN), is proposed to more correctly resolve the imbalance issue by making a horizontally-concatenated task-balanced batch using both reshape and repeat operations during training. Based on our experiments on class incremental learning of CIFAR-100, ImageNet-100, and five dissimilar task datasets, we demonstrate that our TBBN, which works exactly the same as the vanilla BN in the inference time, is easily applicable to most existing exemplar-based offline CIL algorithms and consistently outperforms other BN variants.