Vicregl：自我监督的当地视觉特征学习

论文标题

Vicregl：自我监督的当地视觉特征学习

VICRegL: Self-Supervised Learning of Local Visual Features

论文作者

Bardes, Adrien, Ponce, Jean, LeCun, Yann

论文摘要

最新的学习图像表示方法的自我监督方法集中于生产具有不变性属性的全局功能，或者产生一组本地功能。前者最适合分类任务，而后者最适合检测和分割任务。本文探讨了学习本地和全球特征之间的基本权衡。提出了一种称为Vicregl的新方法，该方法同时学习了良好的全球和本地功能，在检测和细分任务上产生出色的性能，同时保持分类任务的良好性能。具体而言，标准卷积净结构的两个相同的分支被馈送两个不同的扭曲版本的同一图像。 Vicreg标准应用于对全局特征向量对。同时，将VICREG标准应用于最后一个合并层之前发生的局部特征向量对。如果其L2距离低于阈值，或者它们的相对位置与两个输入图像之间的已知几何变换一致，则两个局部特征向量相互吸引。我们在线性分类和分割转移任务上表现出强烈的性能。代码和预估计的模型可在以下网址公开获取：https：//github.com/facebookresearch/vicregl

Most recent self-supervised methods for learning image representations focus on either producing a global feature with invariance properties, or producing a set of local features. The former works best for classification tasks while the latter is best for detection and segmentation tasks. This paper explores the fundamental trade-off between learning local and global features. A new method called VICRegL is proposed that learns good global and local features simultaneously, yielding excellent performance on detection and segmentation tasks while maintaining good performance on classification tasks. Concretely, two identical branches of a standard convolutional net architecture are fed two differently distorted versions of the same image. The VICReg criterion is applied to pairs of global feature vectors. Simultaneously, the VICReg criterion is applied to pairs of local feature vectors occurring before the last pooling layer. Two local feature vectors are attracted to each other if their l2-distance is below a threshold or if their relative locations are consistent with a known geometric transformation between the two input images. We demonstrate strong performance on linear classification and segmentation transfer tasks. Code and pretrained models are publicly available at: https://github.com/facebookresearch/VICRegL

下载PDF全文

下载文献需遵守相关版权规定

论文标题