对蒂米尔的低精度量化的经验研究

论文标题

对蒂米尔的低精度量化的经验研究

An Empirical Study of Low Precision Quantization for TinyML

论文作者

Zhuo, Shaojie, Chen, Hongyu, Ramakrishnan, Ramchalam Kinattinkara, Chen, Tommy, Feng, Chen, Lin, Yicheng, Zhang, Parker, Shen, Liang

论文摘要

在过去的几年中，旨在部署机器学习模型以嵌入具有高度约束的内存和计算能力的AI处理器的小型机器学习（Tinyml）。低精度量化是一种重要的模型压缩技术，可以大大降低模型推理的记忆消耗和计算成本。在这项研究中，我们专注于训练后量化（PTQ）算法，该算法将模型量化为低位（小于8位）精度，仅使用一小部分校准数据，并在不同的Tinyml用例中对其进行了基准测试。为了进行公平的比较，我们构建了一个模拟量化框架来研究最近的PTQ算法。此外，我们将这些算法分解为必不可少的组件，并重新组装一通用的PTQ管道。通过对管道中各个组件的不同替代方案进行消融研究，我们在执行低精度量化时揭示了关键的设计选择。我们希望这项工作可以为未来的低精度量化研究提供有用的数据点和灯光。

Tiny machine learning (tinyML) has emerged during the past few years aiming to deploy machine learning models to embedded AI processors with highly constrained memory and computation capacity. Low precision quantization is an important model compression technique that can greatly reduce both memory consumption and computation cost of model inference. In this study, we focus on post-training quantization (PTQ) algorithms that quantize a model to low-bit (less than 8-bit) precision with only a small set of calibration data and benchmark them on different tinyML use cases. To achieve a fair comparison, we build a simulated quantization framework to investigate recent PTQ algorithms. Furthermore, we break down those algorithms into essential components and re-assembled a generic PTQ pipeline. With ablation study on different alternatives of components in the pipeline, we reveal key design choices when performing low precision quantization. We hope this work could provide useful data points and shed lights on the future research of low precision quantization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题