论文标题
自动修剪量化的神经网络
Automatic Pruning for Quantized Neural Networks
论文作者
论文摘要
神经网络量化和修剪是通常用于减少这些模型进行部署的计算复杂性和记忆足迹的两种技术。但是,大多数现有的修剪策略都在完整精确的情况下运行,并且在量化后不能直接应用于离散参数分布。相比之下,我们研究了这两种技术的组合,以实现进一步的网络压缩。特别是,我们提出了一个有效的修剪策略,用于选择冗余低精度过滤器。此外,我们利用贝叶斯优化来有效地确定每一层的修剪比。我们对CIFAR-10和Imagenet进行了广泛的实验,并具有各种架构和精确度。特别是,对于Imagenet上的RESNET-18,我们以二进制的神经网络量化为型号的26.12%,在2.47 MB的模型中,在4.36 MB的2位Dorefa-NET中,在2.47 MB和59.30%的模型中达到了47.32%的前1个分类精度。
Neural network quantization and pruning are two techniques commonly used to reduce the computational complexity and memory footprint of these models for deployment. However, most existing pruning strategies operate on full-precision and cannot be directly applied to discrete parameter distributions after quantization. In contrast, we study a combination of these two techniques to achieve further network compression. In particular, we propose an effective pruning strategy for selecting redundant low-precision filters. Furthermore, we leverage Bayesian optimization to efficiently determine the pruning ratio for each layer. We conduct extensive experiments on CIFAR-10 and ImageNet with various architectures and precisions. In particular, for ResNet-18 on ImageNet, we prune 26.12% of the model size with Binarized Neural Network quantization, achieving a top-1 classification accuracy of 47.32% in a model of 2.47 MB and 59.30% with a 2-bit DoReFa-Net in 4.36 MB.