梯度集中化：一种针对深神经网络的新优化技术

论文标题

梯度集中化：一种针对深神经网络的新优化技术

Gradient Centralization: A New Optimization Technique for Deep Neural Networks

论文作者

Yong, Hongwei, Huang, Jianqiang, Hua, Xiansheng, Zhang, Lei

论文摘要

优化技术对于有效有效地训练深神经网络（DNN）至关重要。已经表明，使用第一阶和二阶统计（例如平均值和方差）对网络激活或权重向量进行Z分数标准化，例如批处理归一化（BN）和权重标准化（WS）可以提高训练性能。与这些主要在激活或权重的现有方法不同，我们提出了一种新的优化技术，即梯度集中化（GC），该技术通过集中梯度向量直接在梯度上运行，其均值为零。 GC可以看作是具有约束损耗函数的投影梯度下降方法。我们表明，GC可以同时将重量空间和输出功能空间正规化，从而可以提高DNN的概括性能。此外，GC改善了损失函数及其梯度的Lipschitzness，以使训练过程变得更加有效和稳定。 GC非常易于实现，并且可以轻松地将其嵌入仅具有一行代码的现有基于梯度的DNN优化器中。它也可以直接用于微调预训练的DNN。我们对各种应用的实验，包括一般图像分类，细粒图像分类，检测和分割，表明GC可以一致地提高DNN学习的性能。可以在https://github.com/yonghongwei/gradient-centralization上找到GC的代码。

Optimization techniques are of great importance to effectively and efficiently train a deep neural network (DNN). It has been shown that using the first and second order statistics (e.g., mean and variance) to perform Z-score standardization on network activations or weight vectors, such as batch normalization (BN) and weight standardization (WS), can improve the training performance. Different from these existing methods that mostly operate on activations or weights, we present a new optimization technique, namely gradient centralization (GC), which operates directly on gradients by centralizing the gradient vectors to have zero mean. GC can be viewed as a projected gradient descent method with a constrained loss function. We show that GC can regularize both the weight space and output feature space so that it can boost the generalization performance of DNNs. Moreover, GC improves the Lipschitzness of the loss function and its gradient so that the training process becomes more efficient and stable. GC is very simple to implement and can be easily embedded into existing gradient based DNN optimizers with only one line of code. It can also be directly used to fine-tune the pre-trained DNNs. Our experiments on various applications, including general image classification, fine-grained image classification, detection and segmentation, demonstrate that GC can consistently improve the performance of DNN learning. The code of GC can be found at https://github.com/Yonghongwei/Gradient-Centralization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题