论文标题

训练具有Ste变体的量化神经网络:添加噪声退火算法

Training Quantised Neural Networks with STE Variants: the Additive Noise Annealing Algorithm

论文作者

Spallanzani, Matteo, Leonardi, Gian Paolo, Benini, Luca

论文摘要

训练量化的神经网络(QNN)是一个非不同的优化问题,因为权重和特征是通过分段常数功能输出的。标准解决方案是在推断和梯度计算步骤中使用不同功能应用直通估计器(Ste)。文献中提出了几种Ste变体,旨在最大化训练有素的网络的任务准确性。在本文中,我们分析了Ste变体并研究了它们对QNN培训的影响。我们首先观察到大多数这样的变体可以建模为楼梯功能的随机正规化。尽管这种直观的解释并不是什么新鲜事物,但我们严格的讨论概括了进一步的变体。然后,我们分析混合不同正规化的QNN,发现需要对每个图层映射进行一些合适的平滑平滑,以确保偶然的组成收敛到目标不连续函数。基于这些理论见解,我们提出了一种添加噪声退火(ANA),这是一种新算法,用于训练QNNS,包括标准Ste及其变体作为特殊情况。在CIFAR-10图像分类基准上测试ANA时,我们发现对任务准确性的主要影响不是由于正规化的定性形状,而是由于理论上的网络中使用的不同Ste变体的正确同步。

Training quantised neural networks (QNNs) is a non-differentiable optimisation problem since weights and features are output by piecewise constant functions. The standard solution is to apply the straight-through estimator (STE), using different functions during the inference and gradient computation steps. Several STE variants have been proposed in the literature aiming to maximise the task accuracy of the trained network. In this paper, we analyse STE variants and study their impact on QNN training. We first observe that most such variants can be modelled as stochastic regularisations of stair functions; although this intuitive interpretation is not new, our rigorous discussion generalises to further variants. Then, we analyse QNNs mixing different regularisations, finding that some suitably synchronised smoothing of each layer map is required to guarantee pointwise compositional convergence to the target discontinuous function. Based on these theoretical insights, we propose additive noise annealing (ANA), a new algorithm to train QNNs encompassing standard STE and its variants as special cases. When testing ANA on the CIFAR-10 image classification benchmark, we find that the major impact on task accuracy is not due to the qualitative shape of the regularisations but to the proper synchronisation of the different STE variants used in a network, in accordance with the theoretical results.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源