论文标题
深入研究网络中批处理的估计转移
Delving into the Estimation Shift of Batch Normalization in a Network
论文作者
论文摘要
分批归一化(BN)是深度学习的里程碑技术。它在训练过程中使用迷你批量统计数据将激活归一化,但推断期间的估计人口统计数据归一化。本文着重研究人口统计的估计。我们定义了BN的估计转移幅度,以定量测量其估计人口统计数据和预期的差异。我们的主要观察结果是,由于网络中的BN堆栈,可以累积估计转移,这对测试性能造成了损害。我们进一步发现,无批量的归一化(BFN)可以阻止这种估计转移的积累。这些观察结果激发了我们的XBNBlock设计,该设计将一个BN用BFN代替了残留式网络的瓶颈块。 ImageNet和可可基准测试的实验表明,XBNBlock始终提高不同体系结构的性能,包括Resnet和Resnext,显着的余量,似乎更强大,可以更强大。
Batch normalization (BN) is a milestone technique in deep learning. It normalizes the activation using mini-batch statistics during training but the estimated population statistics during inference. This paper focuses on investigating the estimation of population statistics. We define the estimation shift magnitude of BN to quantitatively measure the difference between its estimated population statistics and expected ones. Our primary observation is that the estimation shift can be accumulated due to the stack of BN in a network, which has detriment effects for the test performance. We further find a batch-free normalization (BFN) can block such an accumulation of estimation shift. These observations motivate our design of XBNBlock that replace one BN with BFN in the bottleneck block of residual-style networks. Experiments on the ImageNet and COCO benchmarks show that XBNBlock consistently improves the performance of different architectures, including ResNet and ResNeXt, by a significant margin and seems to be more robust to distribution shift.