多标签图像识别与部分标签的双重镜面语义意识表示融合

论文标题

多标签图像识别与部分标签的双重镜面语义意识表示融合

Dual-Perspective Semantic-Aware Representation Blending for Multi-Label Image Recognition with Partial Labels

论文作者

Pu, Tao, Chen, Tianshui, Wu, Hefeng, Shi, Yukai, Yang, Zhijing, Lin, Liang

论文摘要

尽管取得了令人印象深刻的进展，但当前的多标签图像识别（MLR）算法在很大程度上取决于具有完整标签的大规模数据集，从而使收集大规模的数据集非常耗时和劳动力密集。培训具有部分标签（MLR-PL）的多标签图像识别模型是另一种方法，其中仅知道某些标签，而其他标签则是每个图像的未知方法。但是，当前的MLP-PL算法依赖于预训练的图像相似性模型或迭代更新图像分类模型以生成未知标签的伪标签。因此，它们取决于一定数量的注释，不可避免地会遭受明显的性能下降，尤其是在已知的标签比例较低时。为了解决这一难题，我们建议分别从实例和原型的角度将跨不同图像跨不同图像跨不同图像跨不同图像的多粒度类别特定语义表示融合多个跨性别语义表示表示形式，以将已知标签的信息传递给不知名的标签。具体而言，实例观点表示混合（IPRB）模块旨在将图像中已知标签的表示形式与另一个图像中相应未知标签的表示形式融合在一起，以补充这些未知标签。同时，引入了一个原型 - 观点表示混合（PPRB）模块，以学习每个类别的更稳定的表示原型，并以位于敏感的方式将未知标签的表示与相应标签的原型融合在一起，以补充这些不知名的标签。在MS-Coco，Visual Genome和Pascal VOC 2007数据集上进行的广泛实验表明，所提出的DSRB始终在所有已知的标签比例设置上均超过当前最新算法。

Despite achieving impressive progress, current multi-label image recognition (MLR) algorithms heavily depend on large-scale datasets with complete labels, making collecting large-scale datasets extremely time-consuming and labor-intensive. Training the multi-label image recognition models with partial labels (MLR-PL) is an alternative way, in which merely some labels are known while others are unknown for each image. However, current MLP-PL algorithms rely on pre-trained image similarity models or iteratively updating the image classification models to generate pseudo labels for the unknown labels. Thus, they depend on a certain amount of annotations and inevitably suffer from obvious performance drops, especially when the known label proportion is low. To address this dilemma, we propose a dual-perspective semantic-aware representation blending (DSRB) that blends multi-granularity category-specific semantic representation across different images, from instance and prototype perspective respectively, to transfer information of known labels to complement unknown labels. Specifically, an instance-perspective representation blending (IPRB) module is designed to blend the representations of the known labels in an image with the representations of the corresponding unknown labels in another image to complement these unknown labels. Meanwhile, a prototype-perspective representation blending (PPRB) module is introduced to learn more stable representation prototypes for each category and blends the representation of unknown labels with the prototypes of corresponding labels, in a location-sensitive manner, to complement these unknown labels. Extensive experiments on the MS-COCO, Visual Genome, and Pascal VOC 2007 datasets show that the proposed DSRB consistently outperforms current state-of-the-art algorithms on all known label proportion settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题