通过廉价操作的异质设备上的幽灵网络

论文标题

通过廉价操作的异质设备上的幽灵网络

GhostNets on Heterogeneous Devices via Cheap Operations

论文作者

Han, Kai, Wang, Yunhe, Xu, Chang, Guo, Jianyuan, Xu, Chunjing, Wu, Enhua, Tian, Qi

论文摘要

由于内存和计算资源有限，很难在移动设备上部署卷积神经网络（CNN）。我们旨在通过利用特征地图中的冗余，为包括CPU和GPU在内的异质设备设计有效的神经网络，该功能地图很少在神经体系结构设计中进行调查。对于类似CPU的设备，我们提出了一个新型的CPU效率幽灵（C-Ghost）模块，以从廉价操作中生成更多特征地图。基于一组内在的特征图，我们应用了一系列的线性转换，其廉价成本来生成许多幽灵特征地图，这些图可以完全揭示固有特征的基础信息。提出的C-GHOST模块可以作为插件组件，以升级现有的卷积神经网络。 C-Ghost瓶颈设计为堆叠C-Ghost模块，然后很容易地确定轻质C-Ghostnet。我们进一步考虑了GPU设备的有效网络。在建筑阶段，我们建议不涉及太多的GPU-nifefly操作（例如，深度卷积），我们建议利用阶段的特征冗余，以制定GPU有效的Ghost（G-Ghhost）阶段结构。舞台中的功能分为两个部分，其中使用原始块（具有较少的输出通道生成固有特征的原始块）处理第一部分，而另一部分则是通过利用阶段的冗余而使用廉价操作生成的。在基准上进行的实验证明了拟议的C-Ghost模块和G-Ghost阶段的有效性。 C-Ghostnet和G-Ghostnet可以分别实现CPU和GPU准确性和延迟的最佳权衡。代码可从https://github.com/huawei-noah/cv-backbones获得。

Deploying convolutional neural networks (CNNs) on mobile devices is difficult due to the limited memory and computation resources. We aim to design efficient neural networks for heterogeneous devices including CPU and GPU, by exploiting the redundancy in feature maps, which has rarely been investigated in neural architecture design. For CPU-like devices, we propose a novel CPU-efficient Ghost (C-Ghost) module to generate more feature maps from cheap operations. Based on a set of intrinsic feature maps, we apply a series of linear transformations with cheap cost to generate many ghost feature maps that could fully reveal information underlying intrinsic features. The proposed C-Ghost module can be taken as a plug-and-play component to upgrade existing convolutional neural networks. C-Ghost bottlenecks are designed to stack C-Ghost modules, and then the lightweight C-GhostNet can be easily established. We further consider the efficient networks for GPU devices. Without involving too many GPU-inefficient operations (e.g.,, depth-wise convolution) in a building stage, we propose to utilize the stage-wise feature redundancy to formulate GPU-efficient Ghost (G-Ghost) stage structure. The features in a stage are split into two parts where the first part is processed using the original block with fewer output channels for generating intrinsic features, and the other are generated using cheap operations by exploiting stage-wise redundancy. Experiments conducted on benchmarks demonstrate the effectiveness of the proposed C-Ghost module and the G-Ghost stage. C-GhostNet and G-GhostNet can achieve the optimal trade-off of accuracy and latency for CPU and GPU, respectively. Code is available at https://github.com/huawei-noah/CV-Backbones.

下载PDF全文

下载文献需遵守相关版权规定

论文标题