PARC-NET：位置意识循环卷积，具有Convnets和Transformer的优点

论文标题

PARC-NET：位置意识循环卷积，具有Convnets和Transformer的优点

ParC-Net: Position Aware Circular Convolution with Merits from ConvNets and Transformer

论文作者

Zhang, Haokui, Hu, Wenze, Wang, Xiaoyu

论文摘要

最近，Vision Transformers开始显示出令人印象深刻的结果，这些结果显着超过了基于大卷积的模型。但是，在用于移动或资源约束设备的小型模型的领域中，Convnet在性能和模型复杂性方面仍然具有自己的优势。我们提出了PARC-NET，这是一种基于Convnet的骨干模型，通过将视觉变压器的优点融合到Convnets中，进一步增强了这些优势。具体而言，我们提出了位置意识循环卷积（PARC），这是一种轻巧的卷积OP，它具有全球接受场，同时产生位置敏感特征，如本地卷积。我们将PARC和挤压验化OPS结合在一起，形成像模型块这样的元组合体，该模型块类似于变形金刚等注意力机制。上述块可以以插件的方式使用，以替换相关的障碍物或变压器中的相关块。实验结果表明，所提出的PARC-NET在常见视觉任务和数据集中基于视觉变压器模型的流行型轻巧弯曲和基于视觉变压器的模型更好，同时具有更少的参数和更快的推理速度。对于Imagenet-1K的分类，PARC-NET可实现78.6％的TOP-1准确性，约有500万个参数，节省11％的参数和13％的计算成本，但获得0.2％的精度和23％的推理速度（在基于ARM的Rockchip RK3288上的推理速度和23％的推理）与移动设备相比仅使用0.5倍的参数，但与2.7倍的参数相比，与2.7％的含量相比。在MS-Coco对象检测和Pascal VOC分段任务上，PARC-NET还显示出更好的性能。源代码可从https://github.com/hkzhang91/parc-net获得

Recently, vision transformers started to show impressive results which outperform large convolution based models significantly. However, in the area of small models for mobile or resource constrained devices, ConvNet still has its own advantages in both performance and model complexity. We propose ParC-Net, a pure ConvNet based backbone model that further strengthens these advantages by fusing the merits of vision transformers into ConvNets. Specifically, we propose position aware circular convolution (ParC), a light-weight convolution op which boasts a global receptive field while producing location sensitive features as in local convolutions. We combine the ParCs and squeeze-exictation ops to form a meta-former like model block, which further has the attention mechanism like transformers. The aforementioned block can be used in plug-and-play manner to replace relevant blocks in ConvNets or transformers. Experiment results show that the proposed ParC-Net achieves better performance than popular light-weight ConvNets and vision transformer based models in common vision tasks and datasets, while having fewer parameters and faster inference speed. For classification on ImageNet-1k, ParC-Net achieves 78.6% top-1 accuracy with about 5.0 million parameters, saving 11% parameters and 13% computational cost but gaining 0.2% higher accuracy and 23% faster inference speed (on ARM based Rockchip RK3288) compared with MobileViT, and uses only 0.5 times parameters but gaining 2.7% accuracy compared with DeIT. On MS-COCO object detection and PASCAL VOC segmentation tasks, ParC-Net also shows better performance. Source code is available at https://github.com/hkzhang91/ParC-Net

下载PDF全文

下载文献需遵守相关版权规定

论文标题