论文标题

用于多标签图像分类的图形注意变压器网络

Graph Attention Transformer Network for Multi-Label Image Classification

论文作者

Yuan, Jin, Chen, Shikai, Zhang, Yao, Shi, Zhongchao, Geng, Xin, Fan, Jianping, Rui, Yong

论文摘要

多标签分类旨在识别来自图像的多个对象或属性。但是,从适当的标签图中学习以有效地表征这种标签间相关性或依赖关系是一项挑战。当前方法通常使用基于训练集的标签的共发生概率作为建模该相关性的邻接矩阵,该矩阵受到数据集的极大限制并影响模型的概括能力。在本文中,我们提出了一个图形注意力变压器网络(GATN),这是一个多标签图像分类的一般框架,可以有效地挖掘复杂的标签间关系。首先,我们将基于标签单词嵌入的余弦相似性用作初始相关矩阵,它可以代表丰富的语义信息。随后,我们设计图形注意变压器层以传递此邻接矩阵以适应当前域。我们的广泛实验表明,我们提出的方法可以在三个数据集上实现最先进的性能。

Multi-label classification aims to recognize multiple objects or attributes from images. However, it is challenging to learn from proper label graphs to effectively characterize such inter-label correlations or dependencies. Current methods often use the co-occurrence probability of labels based on the training set as the adjacency matrix to model this correlation, which is greatly limited by the dataset and affects the model's generalization ability. In this paper, we propose a Graph Attention Transformer Network (GATN), a general framework for multi-label image classification that can effectively mine complex inter-label relationships. First, we use the cosine similarity based on the label word embedding as the initial correlation matrix, which can represent rich semantic information. Subsequently, we design the graph attention transformer layer to transfer this adjacency matrix to adapt to the current domain. Our extensive experiments have demonstrated that our proposed methods can achieve state-of-the-art performance on three datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源