S-SGD：对称的随机梯度下降，重量噪声注入以达到平坦的最小值

论文标题

S-SGD：对称的随机梯度下降，重量噪声注入以达到平坦的最小值

S-SGD: Symmetrical Stochastic Gradient Descent with Weight Noise Injection for Reaching Flat Minima

论文作者

Sung, Wonyong, Choi, Iksoo, Park, Jinhwan, Choi, Seokhyun, Shin, Sungho

论文摘要

随机梯度下降（SGD）方法最广泛用于深神经网络（DNN）训练。但是，该方法并不总是会收敛到可以表现出高概括能力的损耗表面的平坦最小值。已经对使用SGD方法进行了广泛的研究，以进行重量噪声注入。我们设计了一种新的基于重量注射的SGD方法，该方法为DNN重量增加了对称的声音。具有对称噪声的训练评估了两个相邻点的损耗表面，可以避免与尖锐的最小值收敛。添加固定磁性对称的噪声以最大程度地减少训练不稳定性。将所提出的方法与常规的SGD方法和先前使用卷积神经网络进行图像分类进行了比较。特别是在大批次培训中的性能提高。与常规SGD和重量噪声注入方法相比，该方法表现出卓越的性能，无论批处理大小和学习率调度算法如何。

The stochastic gradient descent (SGD) method is most widely used for deep neural network (DNN) training. However, the method does not always converge to a flat minimum of the loss surface that can demonstrate high generalization capability. Weight noise injection has been extensively studied for finding flat minima using the SGD method. We devise a new weight-noise injection-based SGD method that adds symmetrical noises to the DNN weights. The training with symmetrical noise evaluates the loss surface at two adjacent points, by which convergence to sharp minima can be avoided. Fixed-magnitude symmetric noises are added to minimize training instability. The proposed method is compared with the conventional SGD method and previous weight-noise injection algorithms using convolutional neural networks for image classification. Particularly, performance improvements in large batch training are demonstrated. This method shows superior performance compared with conventional SGD and weight-noise injection methods regardless of the batch-size and learning rate scheduling algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题