弱监督的注意金字塔卷积神经网络，用于细粒度的视觉分类

论文标题

弱监督的注意金字塔卷积神经网络，用于细粒度的视觉分类

Weakly Supervised Attention Pyramid Convolutional Neural Network for Fine-Grained Visual Classification

论文作者

Ding, Yifeng, Wen, Shaoguo, Xie, Jiyang, Chang, Dongliang, Ma, Zhanyu, Si, Zhongwei, Ling, Haibin

论文摘要

从相同的超级类别（例如鸟类，汽车和飞机模型）中对物体的子类别进行分类，高度依赖于歧视性特征表示和准确的区域定位。现有方法主要集中于从高级功能中提取信息。但是，在本文中，我们表明，通过集成低级信息（例如颜色，边缘连接，纹理模式），可以通过增强的特征表示并准确地定位区分区域来提高性能。 Our solution, named Attention Pyramid Convolutional Neural Network (AP-CNN), consists of a) a pyramidal hierarchy structure with a top-down feature pathway and a bottom-up attention pathway, and hence learns both high-level semantic and low-level detailed feature representation, and b) an ROI guided refinement strategy with ROI guided dropblock and ROI guided zoom-in, which refines features with discriminative local regions消除了增强和背景噪声。建议的AP-CNN可以端到端训练，而无需其他边界框/零件注释。在三个常用的FGVC数据集（CUB-200-2011，Stanford Cars和FGVC-Aircraft）上进行了广泛的实验，这表明我们的方法可以实现最先进的性能。 \ url {http://dwz1.cc/ci8so8a}可用代码

Classifying the sub-categories of an object from the same super-category (e.g. bird species, car and aircraft models) in fine-grained visual classification (FGVC) highly relies on discriminative feature representation and accurate region localization. Existing approaches mainly focus on distilling information from high-level features. In this paper, however, we show that by integrating low-level information (e.g. color, edge junctions, texture patterns), performance can be improved with enhanced feature representation and accurately located discriminative regions. Our solution, named Attention Pyramid Convolutional Neural Network (AP-CNN), consists of a) a pyramidal hierarchy structure with a top-down feature pathway and a bottom-up attention pathway, and hence learns both high-level semantic and low-level detailed feature representation, and b) an ROI guided refinement strategy with ROI guided dropblock and ROI guided zoom-in, which refines features with discriminative local regions enhanced and background noises eliminated. The proposed AP-CNN can be trained end-to-end, without the need of additional bounding box/part annotations. Extensive experiments on three commonly used FGVC datasets (CUB-200-2011, Stanford Cars, and FGVC-Aircraft) demonstrate that our approach can achieve state-of-the-art performance. Code available at \url{http://dwz1.cc/ci8so8a}

下载PDF全文

下载文献需遵守相关版权规定

论文标题