论文标题

通过交叉模式和跨视图对应关系进行自我监督的功能学习

Self-supervised Feature Learning by Cross-modality and Cross-view Correspondences

论文作者

Jing, Longlong, Chen, Yucheng, Zhang, Ling, He, Mingyi, Tian, Yingli

论文摘要

监督学习的成功需要大规模的地面真相标签,这些标签非常昂贵,耗时,或者可能需要特殊技能来注释。为了解决这个问题,开发了许多自我监督的方法。与仅学习2D图像特征或仅3D点云功能的大多数现有自我监督的方法不同,本文提出了一种新颖而有效的自我监督学习方法,可以通过利用交叉模式和跨视图对应关系共同学习2D图像特征和3D点云特征,而无需使用任何人类注释的标签。具体而言,来自不同视图的渲染图像的2D图像特征由2D卷积神经网络提取,而3D点云特征由图形卷积神经网络提取。将两种类型的特征馈入两层完全连接的神经网络,以估计交叉模式对应关系。通过验证两个不同模态的两个样本数据是否属于同一对象,对三个网络进行了共同训练(即交叉模式),与此同时,通过最小化对象内距离,同时最大程度地利用了不同视图中置换图像的对象间距离(即交叉观察),则将2D卷积神经网络得到优化。通过将其传输在五个不同的任务中,包括多视图2D形状识别,3D形状识别,多视图2D形状检索,3D形状检索和3D零件分割,可以评估学习2D和3D特征的有效性。对不同数据集的所有五个不同任务的广泛评估表明,通过拟议的自我监督方法对学习的2D和3D特征的有效性进行了强烈的概括和有效性。

The success of supervised learning requires large-scale ground truth labels which are very expensive, time-consuming, or may need special skills to annotate. To address this issue, many self- or un-supervised methods are developed. Unlike most existing self-supervised methods to learn only 2D image features or only 3D point cloud features, this paper presents a novel and effective self-supervised learning approach to jointly learn both 2D image features and 3D point cloud features by exploiting cross-modality and cross-view correspondences without using any human annotated labels. Specifically, 2D image features of rendered images from different views are extracted by a 2D convolutional neural network, and 3D point cloud features are extracted by a graph convolution neural network. Two types of features are fed into a two-layer fully connected neural network to estimate the cross-modality correspondence. The three networks are jointly trained (i.e. cross-modality) by verifying whether two sampled data of different modalities belong to the same object, meanwhile, the 2D convolutional neural network is additionally optimized through minimizing intra-object distance while maximizing inter-object distance of rendered images in different views (i.e. cross-view). The effectiveness of the learned 2D and 3D features is evaluated by transferring them on five different tasks including multi-view 2D shape recognition, 3D shape recognition, multi-view 2D shape retrieval, 3D shape retrieval, and 3D part-segmentation. Extensive evaluations on all the five different tasks across different datasets demonstrate strong generalization and effectiveness of the learned 2D and 3D features by the proposed self-supervised method.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源