论文标题
具有量化约束的神经网络
Neural Networks with Quantization Constraints
论文作者
论文摘要
在资源和延迟限制的设置中,必须实现深度学习模型的低精度实现,而没有大量的性能降级。此外,利用跨层量化敏感性的差异可以允许混合精度实现,以实现相当多的计算性能权衡。但是,通过量化操作进行反向传播需要引入梯度近似值,并且选择量化哪种层对于由于较大的搜索空间而对现代体系结构具有挑战性。在这项工作中,我们提出了一种有限的学习方法来进行量化意识培训。我们将低精度的监督学习作为一个受约束的优化问题,并表明,尽管具有非跨性别性,但最终的问题是双重双重的,并且消除了梯度估计。此外,我们表明双重变量表明目标对约束扰动的敏感性。我们证明,所提出的方法在图像分类任务中表现出竞争性能,并利用敏感性结果根据双重变量的价值应用层选择性量化,从而实现了可观的性能改进。
Enabling low precision implementations of deep learning models, without considerable performance degradation, is necessary in resource and latency constrained settings. Moreover, exploiting the differences in sensitivity to quantization across layers can allow mixed precision implementations to achieve a considerably better computation performance trade-off. However, backpropagating through the quantization operation requires introducing gradient approximations, and choosing which layers to quantize is challenging for modern architectures due to the large search space. In this work, we present a constrained learning approach to quantization aware training. We formulate low precision supervised learning as a constrained optimization problem, and show that despite its non-convexity, the resulting problem is strongly dual and does away with gradient estimations. Furthermore, we show that dual variables indicate the sensitivity of the objective with respect to constraint perturbations. We demonstrate that the proposed approach exhibits competitive performance in image classification tasks, and leverage the sensitivity result to apply layer selective quantization based on the value of dual variables, leading to considerable performance improvements.