论文标题
在批处理标准化的向后传播中稳定批处理统计数据
Towards Stabilizing Batch Statistics in Backward Propagation of Batch Normalization
论文作者
论文摘要
分批归一化(BN)是深度学习领域中最广泛使用的技术之一。但是,由于批处理尺寸不足,其性能会极大地降解。这种弱点限制了BN在许多计算机视觉任务(例如检测或分段)上的使用,在这些计算机视觉任务中,由于内存消耗的限制,批次大小通常很小。因此,已经提出了许多修改的归一化技术,要么无法完全恢复BN的性能,要么不得不在推理过程中引入其他非线性操作并增加了巨大的消费。在本文中,我们揭示了BN向后传播中涉及两个额外的批处理统计数据,以前从未对此进行过很好的讨论。与梯度相关的额外批处理统计数据也可能严重影响深度神经网络的训练。基于我们的分析,我们提出了一种新型的归一化方法,称为移动平均批归一化(MABN)。 MABN可以在小批处理案例中完全恢复香草BN的性能,而无需在推理过程中引入任何其他非线性操作。我们通过理论分析和实验证明了MABN的好处。我们的实验证明了MABN在包括Imagenet和可可在内的多个计算机视觉任务中的有效性。该代码已在https://github.com/megvii-model/mabn中发布。
Batch Normalization (BN) is one of the most widely used techniques in Deep Learning field. But its performance can awfully degrade with insufficient batch size. This weakness limits the usage of BN on many computer vision tasks like detection or segmentation, where batch size is usually small due to the constraint of memory consumption. Therefore many modified normalization techniques have been proposed, which either fail to restore the performance of BN completely, or have to introduce additional nonlinear operations in inference procedure and increase huge consumption. In this paper, we reveal that there are two extra batch statistics involved in backward propagation of BN, on which has never been well discussed before. The extra batch statistics associated with gradients also can severely affect the training of deep neural network. Based on our analysis, we propose a novel normalization method, named Moving Average Batch Normalization (MABN). MABN can completely restore the performance of vanilla BN in small batch cases, without introducing any additional nonlinear operations in inference procedure. We prove the benefits of MABN by both theoretical analysis and experiments. Our experiments demonstrate the effectiveness of MABN in multiple computer vision tasks including ImageNet and COCO. The code has been released in https://github.com/megvii-model/MABN.