学习分离的多标签分类标签表示

论文标题

学习分离的多标签分类标签表示

Learning Disentangled Label Representations for Multi-label Classification

论文作者

Jia, Jian, He, Fei, Gao, Naiyu, Chen, Xiaotang, Huang, Kaiqi

论文摘要

尽管已经提出了多种标签分类的方法，但大多数方法仍然遵循单标签（多类）分类的特征学习机制，即学习共享图像功能以对多个标签进行分类。但是，我们发现这种单共享的 - 型标签（OFML）机制不利于学习歧视性标签特征，并使模型不舒适。我们第一次数学上证明OFML机制的自卑是在最小化跨凝结损失的情况下，最佳学习的图像特征不能同时保持与多个分类器的高相似性。为了解决OFML机制的局限性，我们介绍了一种特异性的特定特征（OFOL）机制，并提出了一个新型的分离标签特征学习（DLFL）框架，以学习每个标签的分离表示形式。框架的特异性在于特征删除模块，其中包含可学习的语义查询和语义空间交叉注意（SSCA）模块。具体而言，可学习的语义查询保持了同一标签不同图像之间的语义一致性。 SSCA模块将与标签相关的空间区域和聚集物定位在相应的标签特征中，以实现特征分离。我们在八个任务的八个数据集上实现了最先进的性能，即\ ie，多标签分类，行人属性识别和持续的多标签学习。

Although various methods have been proposed for multi-label classification, most approaches still follow the feature learning mechanism of the single-label (multi-class) classification, namely, learning a shared image feature to classify multiple labels. However, we find this One-shared-Feature-for-Multiple-Labels (OFML) mechanism is not conducive to learning discriminative label features and makes the model non-robustness. For the first time, we mathematically prove that the inferiority of the OFML mechanism is that the optimal learned image feature cannot maintain high similarities with multiple classifiers simultaneously in the context of minimizing cross-entropy loss. To address the limitations of the OFML mechanism, we introduce the One-specific-Feature-for-One-Label (OFOL) mechanism and propose a novel disentangled label feature learning (DLFL) framework to learn a disentangled representation for each label. The specificity of the framework lies in a feature disentangle module, which contains learnable semantic queries and a Semantic Spatial Cross-Attention (SSCA) module. Specifically, learnable semantic queries maintain semantic consistency between different images of the same label. The SSCA module localizes the label-related spatial regions and aggregates located region features into the corresponding label feature to achieve feature disentanglement. We achieve state-of-the-art performance on eight datasets of three tasks, \ie, multi-label classification, pedestrian attribute recognition, and continual multi-label learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题