搜索Winograd感知的量化网络

论文标题

搜索Winograd感知的量化网络

Searching for Winograd-aware Quantized Networks

论文作者

Fernandez-Marques, Javier, Whatmough, Paul N., Mundy, Andrew, Mattina, Matthew

论文摘要

卷积神经网络（CNN）的轻量级体系结构设计以及量化为部署移动设备上苛刻的计算机视觉应用程序铺平了道路。与此平行的是，诸如FFT，Strassen和Winograd之类的卷积操作的替代配方已被调整用于提供进一步加速的CNN。 Winograd的卷积是空间上很小的卷积的最快已知算法，但是利用它们的全部潜力带来了数值错误的负担，使它们在量化的环境中无法使用。在这项工作中，我们提出了卷积层的winograd感知表述，该表述揭示了Winograd转换引入的数值不准确性，以了解模型参数的学习，从而使竞争性量化的模型的设计不影响模型大小。我们还解决了数值误差的来源，并提出了转换矩阵形式的放松，从而在CIFAR-10上提高了分类精度高达10％。最后，我们提出了WINAS，这是一种神经体系结构搜索（NAS）框架，该框架共同优化给定的宏观体系结构，以确保winograd-warewawawawawawawawawawawawawawawaware层。与使用IM2row的IM2ROW相比，使用Winograd Awaw Awawawawawawawawawawawawawawawawawawawawawawawaw的RESNET-18中，CIFAR-10的WINAT-18导致2.66倍的速度，这是使用最广泛的优化卷积实现之一，准确性没有损失。

Lightweight architectural designs of Convolutional Neural Networks (CNNs) together with quantization have paved the way for the deployment of demanding computer vision applications on mobile devices. Parallel to this, alternative formulations to the convolution operation such as FFT, Strassen and Winograd, have been adapted for use in CNNs offering further speedups. Winograd convolutions are the fastest known algorithm for spatially small convolutions, but exploiting their full potential comes with the burden of numerical error, rendering them unusable in quantized contexts. In this work we propose a Winograd-aware formulation of convolution layers which exposes the numerical inaccuracies introduced by the Winograd transformations to the learning of the model parameters, enabling the design of competitive quantized models without impacting model size. We also address the source of the numerical error and propose a relaxation on the form of the transformation matrices, resulting in up to 10% higher classification accuracy on CIFAR-10. Finally, we propose wiNAS, a neural architecture search (NAS) framework that jointly optimizes a given macro-architecture for accuracy and latency leveraging Winograd-aware layers. A Winograd-aware ResNet-18 optimized with wiNAS for CIFAR-10 results in 2.66x speedup compared to im2row, one of the most widely used optimized convolution implementations, with no loss in accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题