论文标题
Univip:一个自我监督视觉预训练的统一框架
UniVIP: A Unified Framework for Self-Supervised Visual Pre-training
论文作者
论文摘要
自我监督学习(SSL)在利用大量未标记数据方面有希望。但是,流行的SSL方法的成功限制了单一中性对象图像(例如ImageNet中的图像),而忽略了场景和实例之间的相关性以及场景中实例的语义差异。为了解决上述问题,我们提出了一个统一的自我监督的视觉预训练(Univip),这是一个新型的自我监督框架,以学习单中心对象或非偶生数据集的多功能视觉表示。该框架考虑了三个级别的表示形式学习:1)场景场景的相似性,2)场景 - 现实的相关性,3)实例 - 实例的歧视。在学习过程中,我们采用最佳运输算法来自动测量实例的歧视。大规模的实验表明,在非偶像可可进行的预训练的Univip可以在各种下游任务上实现最新的转移性能,例如图像分类,半手不足的学习,对象检测和分割。此外,我们的方法还可以利用单一中心对象数据集(例如ImageNet和表现)在线性探测中具有相同的预训练时期,并超越了COCO数据集中的电流自我渗透对象检测方法,并展示了其普遍性和潜在的。
Self-supervised learning (SSL) holds promise in leveraging large amounts of unlabeled data. However, the success of popular SSL methods has limited on single-centric-object images like those in ImageNet and ignores the correlation among the scene and instances, as well as the semantic difference of instances in the scene. To address the above problems, we propose a Unified Self-supervised Visual Pre-training (UniVIP), a novel self-supervised framework to learn versatile visual representations on either single-centric-object or non-iconic dataset. The framework takes into account the representation learning at three levels: 1) the similarity of scene-scene, 2) the correlation of scene-instance, 3) the discrimination of instance-instance. During the learning, we adopt the optimal transport algorithm to automatically measure the discrimination of instances. Massive experiments show that UniVIP pre-trained on non-iconic COCO achieves state-of-the-art transfer performance on a variety of downstream tasks, such as image classification, semi-supervised learning, object detection and segmentation. Furthermore, our method can also exploit single-centric-object dataset such as ImageNet and outperforms BYOL by 2.5% with the same pre-training epochs in linear probing, and surpass current self-supervised object detection methods on COCO dataset, demonstrating its universality and potential.