通过粗梯度方法学习对非线性分类的量化神经网量化

论文标题

通过粗梯度方法学习对非线性分类的量化神经网量化

Learning Quantized Neural Nets by Coarse Gradient Method for Non-linear Classification

论文作者

Long, Ziang, Yin, Penghang, Xin, Jack

论文摘要

量化或低位神经网络由于推断效率而具有吸引力。但是，训练具有量化激活的深度神经网络涉及最大程度地减少不连续和分段恒定损耗函数。这样的损耗函数几乎到处都有零梯度（A.E.），这使得基于梯度的常规算法不适用。为此，我们研究了一种新型的\ emph {偏见}的一阶甲骨文，称为粗梯度，以克服消失的梯度问题。通过替换A.E.产生粗梯度。链条规则中量化（即楼梯案例）的零激活的零衍生物，具有一些启发式代理衍生物，称为直发估计量（Ste）。尽管已广泛用于培训量化网络，但基本问题，例如何时和为什么临时trick trick的工作，但仍然缺乏理论上的理解。在本文中，我们提出了一类具有一定单调性的Stes，并考虑了它们在具有量化激活函数的两线性层网络中的应用，用于非线性多类别分类。我们通过表明相应的粗梯度方法融合到全球最小值，从而为所提出的Stes建立性能保证，从而导致完美的分类。最后，我们对合成数据以及MNIST数据集提出了实验结果，以验证我们的理论发现并证明我们提出的Stes的有效性。

Quantized or low-bit neural networks are attractive due to their inference efficiency. However, training deep neural networks with quantized activations involves minimizing a discontinuous and piecewise constant loss function. Such a loss function has zero gradients almost everywhere (a.e.), which makes the conventional gradient-based algorithms inapplicable. To this end, we study a novel class of \emph{biased} first-order oracle, termed coarse gradient, for overcoming the vanished gradient issue. A coarse gradient is generated by replacing the a.e. zero derivatives of quantized (i.e., stair-case) ReLU activation composited in the chain rule with some heuristic proxy derivative called straight-through estimator (STE). Although having been widely used in training quantized networks empirically, fundamental questions like when and why the ad-hoc STE trick works, still lacks theoretical understanding. In this paper, we propose a class of STEs with certain monotonicity, and consider their applications to the training of a two-linear-layer network with quantized activation functions for non-linear multi-category classification. We establish performance guarantees for the proposed STEs by showing that the corresponding coarse gradient methods converge to the global minimum, which leads to a perfect classification. Lastly, we present experimental results on synthetic data as well as MNIST dataset to verify our theoretical findings and demonstrate the effectiveness of our proposed STEs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题