P4Concontrast：对比对比度学习，与点像素对成对，用于RGB-D场景理解

论文标题

P4Concontrast：对比对比度学习，与点像素对成对，用于RGB-D场景理解

P4Contrast: Contrastive Learning with Pairs of Point-Pixel Pairs for RGB-D Scene Understanding

论文作者

Liu, Yunze, Yi, Li, Zhang, Shanghang, Fan, Qingnan, Funkhouser, Thomas, Dong, Hao

论文摘要

自我监督的表示学习是计算机视觉中的一个关键问题，因为它提供了一种在大型未标记数据集中提取功能提取器的方法，该数据集可以用作初始化，以便在下游任务上进行更高效，有效的培训。一种有前途的方法是使用对比度学习学习一个潜在空间，在该空间中，相似数据样本的功能很接近，而对于不同的数据样本也很遥远。这种方法在预测图像和点云提取器方面取得了巨大的成功，但是几乎没有研究多模式RGB-D扫描，尤其是促进高级现场理解的目的。为了解决这个问题，我们提出了对比的“点像素对对”，其中阳性包括对应中的RGB-D点对，负否包括对两个模态之一受到干扰和/或两个RGB-D点的对应的对。这为制作艰苦的负面影响提供了额外的灵活性，并帮助网络从两种方式中学习功能，而不仅仅是歧视两者之一。实验表明，这种提出的方法比以前的训练方法相比，在三个大规模的RGB-D场景中，在三个大规模的RGB-D场景中产生更好的性能。

Self-supervised representation learning is a critical problem in computer vision, as it provides a way to pretrain feature extractors on large unlabeled datasets that can be used as an initialization for more efficient and effective training on downstream tasks. A promising approach is to use contrastive learning to learn a latent space where features are close for similar data samples and far apart for dissimilar ones. This approach has demonstrated tremendous success for pretraining both image and point cloud feature extractors, but it has been barely investigated for multi-modal RGB-D scans, especially with the goal of facilitating high-level scene understanding. To solve this problem, we propose contrasting "pairs of point-pixel pairs", where positives include pairs of RGB-D points in correspondence, and negatives include pairs where one of the two modalities has been disturbed and/or the two RGB-D points are not in correspondence. This provides extra flexibility in making hard negatives and helps networks to learn features from both modalities, not just the more discriminating one of the two. Experiments show that this proposed approach yields better performance on three large-scale RGB-D scene understanding benchmarks (ScanNet, SUN RGB-D, and 3RScan) than previous pretraining approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题