论文标题
精灵:向我展示量化数据
Genie: Show Me the Data for Quantization
论文作者
论文摘要
当数据无法访问,包括各种原因,包括与隐私相关的成本和问题,零拍摄的量化是一种开发轻量级深神经网络的有前途的方法。通过利用在FP32-Pre训练的模型中利用批处理标准化层的学习参数($μ$和$σ$),零拍摄的量化方案专注于生成合成数据。随后,他们将知识从预训练的模型(教师)提炼为量化的模型(学生),以便可以使用合成数据集优化量化的模型。但是,到目前为止,零量量化主要是在量化感知的训练方法的背景下进行了讨论,这些培训方法需要特定于任务的损失和长期优化与重新培训一样多。因此,我们引入了零量量化的训练后量化方案,该方案在几个小时内产生高质量的量化网络。此外,我们提出了一个名为Genie〜的框架,该框架生成适合量化的数据。通过Genie合成的数据,我们可以在没有实际数据集的情况下生成可靠的量化模型,这与少量量化相当。我们还提出了一种训练后量化算法,以增强量化模型的性能。通过将它们结合起来,我们可以弥合零射击和少量量化之间的差距,同时与现有方法相比显着提高量化性能。换句话说,我们可以获得独特的最先进的零拍量化方法。该代码可在\ url {https://github.com/samsunglabs/genie}中获得。
Zero-shot quantization is a promising approach for developing lightweight deep neural networks when data is inaccessible owing to various reasons, including cost and issues related to privacy. By exploiting the learned parameters ($μ$ and $σ$) of batch normalization layers in an FP32-pre-trained model, zero-shot quantization schemes focus on generating synthetic data. Subsequently, they distill knowledge from the pre-trained model (teacher) to the quantized model (student) such that the quantized model can be optimized with the synthetic dataset. However, thus far, zero-shot quantization has primarily been discussed in the context of quantization-aware training methods, which require task-specific losses and long-term optimization as much as retraining. We thus introduce a post-training quantization scheme for zero-shot quantization that produces high-quality quantized networks within a few hours. Furthermore, we propose a framework called Genie~that generates data suited for quantization. With the data synthesized by Genie, we can produce robust quantized models without real datasets, which is comparable to few-shot quantization. We also propose a post-training quantization algorithm to enhance the performance of quantized models. By combining them, we can bridge the gap between zero-shot and few-shot quantization while significantly improving the quantization performance compared to that of existing approaches. In other words, we can obtain a unique state-of-the-art zero-shot quantization approach. The code is available at \url{https://github.com/SamsungLabs/Genie}.