以视觉概念为基础的自我监督语义分割

论文标题

以视觉概念为基础的自我监督语义分割

Self-supervised Semantic Segmentation Grounded in Visual Concepts

论文作者

He, Wenbin, Surmeier, William, Shekar, Arvind Kumar, Gou, Liang, Ren, Liu

论文摘要

无监督的语义分割需要将标签分配给每个像素，而无需任何人类注释。尽管对单个图像的自我监督表示学习的最新进展，但使用像素级表示的无监督语义细分仍然是一项具有挑战性的任务，并且仍然没有被淘汰。在这项工作中，我们通过使用视觉概念（即具有语义含义的像素组，例如零件，对象和场景）提出一种自我监督的像素表示学习方法，以进行语义分割。为了指导自我监督的学习，我们利用像素和概念之间的三种类型的关系，包括像素与本地概念之间的关系，本地和全球概念以及概念的共发生。我们评估了包括Pascal VOC 2012，Coco 2017和Davis 2017在内的三个数据集上学习过的像素嵌入和视觉概念。我们的结果表明，所提出的方法在最近无人监督的语义细分方法中获得了一致和实质性的改进，并且还表明，视觉概念可以揭示到图像数据集中的洞察力。

Unsupervised semantic segmentation requires assigning a label to every pixel without any human annotations. Despite recent advances in self-supervised representation learning for individual images, unsupervised semantic segmentation with pixel-level representations is still a challenging task and remains underexplored. In this work, we propose a self-supervised pixel representation learning method for semantic segmentation by using visual concepts (i.e., groups of pixels with semantic meanings, such as parts, objects, and scenes) extracted from images. To guide self-supervised learning, we leverage three types of relationships between pixels and concepts, including the relationships between pixels and local concepts, local and global concepts, as well as the co-occurrence of concepts. We evaluate the learned pixel embeddings and visual concepts on three datasets, including PASCAL VOC 2012, COCO 2017, and DAVIS 2017. Our results show that the proposed method gains consistent and substantial improvements over recent unsupervised semantic segmentation approaches, and also demonstrate that visual concepts can reveal insights into image datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题