论文标题

粗到罚款:全球/局部注意的多标签图像分类

Coarse to Fine: Multi-label Image Classification with Global/Local Attention

论文作者

Lyu, Fan, Hu, Fuyuan, Sheng, Victor S., Wu, Zhengtian, Fu, Qiming, Fu, Baochuan

论文摘要

在我们的日常生活中,我们周围的场景总是带有多个标签,尤其是在智慧城市,即认识到城市操作以响应和控制的信息。通过使用深层神经网络来识别多标签图像,已经做出了巨大的努力。由于多标签图像分类非常复杂,因此人们寻求使用注意机制来指导分类过程。但是,常规的基于注意的方法总是直接和积极地分析图像。他们很难很好地理解复杂的场景。在本文中,我们提出了一种全球/局部注意方法,该方法可以通过模仿人类观察图像来识别从粗糙到罚款的图像。具体而言,我们的全局/本地注意方法首先集中在整个图像上,然后重点关注图像中的本地特定对象。我们还提出了一个关节最大利润率目标函数,该函数强迫正面标签的最小分数应大于水平和垂直标签的最大分数。此功能可以进一步改善我们的多标签图像分类方法。我们评估了方法对两个流行的多标签图像数据集(即Pascal VOC和MS-Coco)的有效性。我们的实验结果表明,我们的方法的表现优于最先进的方法。

In our daily life, the scenes around us are always with multiple labels especially in a smart city, i.e., recognizing the information of city operation to response and control. Great efforts have been made by using Deep Neural Networks to recognize multi-label images. Since multi-label image classification is very complicated, people seek to use the attention mechanism to guide the classification process. However, conventional attention-based methods always analyzed images directly and aggressively. It is difficult for them to well understand complicated scenes. In this paper, we propose a global/local attention method that can recognize an image from coarse to fine by mimicking how human-beings observe images. Specifically, our global/local attention method first concentrates on the whole image, and then focuses on local specific objects in the image. We also propose a joint max-margin objective function, which enforces that the minimum score of positive labels should be larger than the maximum score of negative labels horizontally and vertically. This function can further improve our multi-label image classification method. We evaluate the effectiveness of our method on two popular multi-label image datasets (i.e., Pascal VOC and MS-COCO). Our experimental results show that our method outperforms state-of-the-art methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源