论文标题
语义意识的细粒对应
Semantic-Aware Fine-Grained Correspondence
论文作者
论文摘要
跨图像建立视觉对应是一项具有挑战性且必不可少的任务。最近,已经提出了大量的自我监督方法来更好地学习视觉对应的表示。但是,我们发现这些方法通常无法利用语义信息,并且在低级功能的匹配方面过度融合。相反,人类的视觉能够将不同的物体区分为跟踪的借口。受此范式的启发,我们建议学习语义意识的细粒对应关系。首先,我们证明语义对应是通过一组丰富的图像级别自我监督的方法隐式获得的。我们进一步设计了一个像素级的自我监督学习目标,该目标专门针对细粒度的对应关系。对于下游任务,我们将这两种互补的对应表示形式融合在一起,表明它们可以协同增强性能。我们的方法超过了先前的最先进的自我监督方法,该方法使用卷积网络在各种视觉通信任务上,包括视频对象分割,人姿势跟踪和人类部分跟踪。
Establishing visual correspondence across images is a challenging and essential task. Recently, an influx of self-supervised methods have been proposed to better learn representations for visual correspondence. However, we find that these methods often fail to leverage semantic information and over-rely on the matching of low-level features. In contrast, human vision is capable of distinguishing between distinct objects as a pretext to tracking. Inspired by this paradigm, we propose to learn semantic-aware fine-grained correspondence. Firstly, we demonstrate that semantic correspondence is implicitly available through a rich set of image-level self-supervised methods. We further design a pixel-level self-supervised learning objective which specifically targets fine-grained correspondence. For downstream tasks, we fuse these two kinds of complementary correspondence representations together, demonstrating that they boost performance synergistically. Our method surpasses previous state-of-the-art self-supervised methods using convolutional networks on a variety of visual correspondence tasks, including video object segmentation, human pose tracking, and human part tracking.