混合精确推理量化：从根本上讲，朝更快的推理速度，较低的存储要求和降低损失

论文标题

混合精确推理量化：从根本上讲，朝更快的推理速度，较低的存储要求和降低损失

Mixed-Precision Inference Quantization: Radically Towards Faster inference speed, Lower Storage requirement, and Lower Loss

论文作者

Cheng, Daning, Chen, Wenguang

论文摘要

基于模型对计算噪声的弹性，模型量化对于压缩模型和提高计算速度很重要。现有的量化技术在很大程度上取决于经验和“微调”技能。在大多数情况下，量化模型的损失比完整的精确模型更大。这项研究提供了一种方法，用于获取比完整精度模型低的混合精确量化模型。此外，分析表明，在整个推论过程中，损耗函数主要受层输入噪声的影响。特别是，我们将证明具有大量身份映射的神经网络对量化方法具有抵抗力。使用量化也很难提高这些网络的性能。

Based on the model's resilience to computational noise, model quantization is important for compressing models and improving computing speed. Existing quantization techniques rely heavily on experience and "fine-tuning" skills. In the majority of instances, the quantization model has a larger loss than a full precision model. This study provides a methodology for acquiring a mixed-precise quantization model with a lower loss than the full precision model. In addition, the analysis demonstrates that, throughout the inference process, the loss function is mostly affected by the noise of the layer inputs. In particular, we will demonstrate that neural networks with massive identity mappings are resistant to the quantization method. It is also difficult to improve the performance of these networks using quantization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题