多标签图像识别与部分标签的语义意识表示融合

论文标题

多标签图像识别与部分标签的语义意识表示融合

Semantic-Aware Representation Blending for Multi-Label Image Recognition with Partial Labels

论文作者

Pu, Tao, Chen, Tianshui, Wu, Hefeng, Lin, Liang

论文摘要

培训具有部分标签的多标签图像识别模型，其中仅知道一些标签，而另一些标签是每个图像未知的，这是一项非常具有挑战性和实用的任务。为了解决此任务，当前算法主要取决于训练前分类或相似性模型，以生成未知标签的伪标签。但是，这些算法取决于足够的多标签注释来训练模型，导致性能差，尤其是已知标签比例低。在这项工作中，我们建议将不同图像的特定于类别的表示形式融合在一起，以将已知标签的信息传递到补充未知标签的信息，这些标签可以摆脱前训练模型，因此不取决于足够的注释。 To this end, we design a unified semantic-aware representation blending (SARB) framework that exploits instance-level and prototype-level semantic representation to complement unknown labels by two complementary modules: 1) an instance-level representation blending (ILRB) module blends the representations of the known labels in an image to the representations of the unknown labels in another image to complement these unknown labels. 2）原型级表示混合（PLRB）模块学习每个类别的更稳定表示原型，并将未知标签的表示与相应标签的原型融合以补充这些标签。在MS-Coco，Visual Genome，Pascal VOC 2007数据集上进行的广泛实验表明，拟议的SARB框架在所有已知的标签比例设置上，在已知的标签比例为10％时，在这三个数据集上，在所有已知的标签比例设置上，在所有已知标签比例设置上获得了优于当前领先竞争者的表现。代码可在https://github.com/hcplab-sysu/hcp-mlr-pl上找到。

Training the multi-label image recognition models with partial labels, in which merely some labels are known while others are unknown for each image, is a considerably challenging and practical task. To address this task, current algorithms mainly depend on pre-training classification or similarity models to generate pseudo labels for the unknown labels. However, these algorithms depend on sufficient multi-label annotations to train the models, leading to poor performance especially with low known label proportion. In this work, we propose to blend category-specific representation across different images to transfer information of known labels to complement unknown labels, which can get rid of pre-training models and thus does not depend on sufficient annotations. To this end, we design a unified semantic-aware representation blending (SARB) framework that exploits instance-level and prototype-level semantic representation to complement unknown labels by two complementary modules: 1) an instance-level representation blending (ILRB) module blends the representations of the known labels in an image to the representations of the unknown labels in another image to complement these unknown labels. 2) a prototype-level representation blending (PLRB) module learns more stable representation prototypes for each category and blends the representation of unknown labels with the prototypes of corresponding labels to complement these labels. Extensive experiments on the MS-COCO, Visual Genome, Pascal VOC 2007 datasets show that the proposed SARB framework obtains superior performance over current leading competitors on all known label proportion settings, i.e., with the mAP improvement of 4.6%, 4.%, 2.2% on these three datasets when the known label proportion is 10%. Codes are available at https://github.com/HCPLab-SYSU/HCP-MLR-PL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题