论文标题
通过最小化运动方案对残留网络的块训练
Block-wise Training of Residual Networks via the Minimizing Movement Scheme
论文作者
论文摘要
端到端的反向传播有一些缺点:它需要在训练过程中加载整个模型,这在受约束的设置中可能是不可能的,并且遭受了三个锁定问题(向前锁定,更新锁定和向后锁定),这禁止并行训练层。解决图层优化问题可以解决这些问题,并已用于神经网络的设备培训。我们开发了一种层面训练方法,尤其是良好的重新装置,灵感来自分配空间中梯度流的最小化运动方案。该方法等于每个块的动能正则化,这使块最佳传输图并赋予它们规律性。它通过减轻层次训练中观察到的停滞问题而起作用,从而使贪婪训练的早期层过度合适,并且更深的层在一定深度后停止提高测试准确性。我们在分类任务上表明,使用我们的方法,无论是逐步训练还是并行训练,可以提高经过阻止训练的重新NET的测试准确性。
End-to-end backpropagation has a few shortcomings: it requires loading the entire model during training, which can be impossible in constrained settings, and suffers from three locking problems (forward locking, update locking and backward locking), which prohibit training the layers in parallel. Solving layer-wise optimization problems can address these problems and has been used in on-device training of neural networks. We develop a layer-wise training method, particularly welladapted to ResNets, inspired by the minimizing movement scheme for gradient flows in distribution space. The method amounts to a kinetic energy regularization of each block that makes the blocks optimal transport maps and endows them with regularity. It works by alleviating the stagnation problem observed in layer-wise training, whereby greedily-trained early layers overfit and deeper layers stop increasing test accuracy after a certain depth. We show on classification tasks that the test accuracy of block-wise trained ResNets is improved when using our method, whether the blocks are trained sequentially or in parallel.