并行平均的随机重量：大批量训练可以很好地推广

论文标题

并行平均的随机重量：大批量训练可以很好地推广

Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well

论文作者

Gupta, Vipul, Serrano, Santiago Akle, DeCoste, Dennis

论文摘要

我们提出并行平均（交换）的随机重量，这是一种加速DNN训练的算法。我们的算法使用大型迷你批次来快速计算近似解决方案，然后通过平均和并行计算的多个模型的权重来完善它。所得模型的概括程度与经过小型迷你批量训练但在较短的时间内生产的模型同样良好。我们证明了训练时间的减少以及在计算机视觉数据集CIFAR10，CIFAR100和Imagenet上产生模型的良好概括性能。

We propose Stochastic Weight Averaging in Parallel (SWAP), an algorithm to accelerate DNN training. Our algorithm uses large mini-batches to compute an approximate solution quickly and then refines it by averaging the weights of multiple models computed independently and in parallel. The resulting models generalize equally well as those trained with small mini-batches but are produced in a substantially shorter time. We demonstrate the reduction in training time and the good generalization performance of the resulting models on the computer vision datasets CIFAR10, CIFAR100, and ImageNet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题