论文标题
折叠或不折叠:批准层折叠的必要条件
To Fold or Not to Fold: a Necessary and Sufficient Condition on Batch-Normalization Layers Folding
论文作者
论文摘要
分批归一化(BN)层已成为更加复杂的深神经网络体系结构中的基本组成部分。这样的模型需要在边缘设备上部署的加速过程。但是,由于顺序操作处理,BN层添加了计算瓶颈:因此,加速过程的键但通常被忽略的组件是BN层折叠。在本文中,我们证明了当前的BN折叠方法在可以去除多少层方面是次优的。因此,我们为BN折叠和相应的最佳算法提供了必要且充分的条件。所提出的方法系统地超过了现有的基准,并允许大幅度减少深神经网络的推理时间。
Batch-Normalization (BN) layers have become fundamental components in the evermore complex deep neural network architectures. Such models require acceleration processes for deployment on edge devices. However, BN layers add computation bottlenecks due to the sequential operation processing: thus, a key, yet often overlooked component of the acceleration process is BN layers folding. In this paper, we demonstrate that the current BN folding approaches are suboptimal in terms of how many layers can be removed. We therefore provide a necessary and sufficient condition for BN folding and a corresponding optimal algorithm. The proposed approach systematically outperforms existing baselines and allows to dramatically reduce the inference time of deep neural networks.