论文标题
自我监督的模态并查看不变功能学习
Self-supervised Modal and View Invariant Feature Learning
论文作者
论文摘要
3D数据的大多数现有自我监督的特征学习方法要么从点云数据或多视图图像学习3D功能。通过探索3D对象的固有的多模式属性,在本文中,我们建议从不同模态中共同学习模态不变和观看不变的特征,包括图像,点云和网格与异质网络的3D数据。为了学习模态和观看不变的功能,我们提出了两种类型的约束:跨模式不变性约束和跨视图不变性约束。跨模式的不变性约束迫使网络最大程度地达成相同对象的不同模式的特征一致,而跨视图不变性约束则迫使网络从相同对象的图像的不同视图中最大程度地达成特征。已经在不同的下游任务上测试了学习功能的质量,并具有三种数据模式,包括点云,多视图图像和网格。此外,通过跨模式检索任务评估了不变性交叉不同的方式和视图。广泛的评估结果表明,学到的特征是强大的,并且在不同任务之间具有强大的普遍性。
Most of the existing self-supervised feature learning methods for 3D data either learn 3D features from point cloud data or from multi-view images. By exploring the inherent multi-modality attributes of 3D objects, in this paper, we propose to jointly learn modal-invariant and view-invariant features from different modalities including image, point cloud, and mesh with heterogeneous networks for 3D data. In order to learn modal- and view-invariant features, we propose two types of constraints: cross-modal invariance constraint and cross-view invariant constraint. Cross-modal invariance constraint forces the network to maximum the agreement of features from different modalities for same objects, while the cross-view invariance constraint forces the network to maximum agreement of features from different views of images for same objects. The quality of learned features has been tested on different downstream tasks with three modalities of data including point cloud, multi-view images, and mesh. Furthermore, the invariance cross different modalities and views are evaluated with the cross-modal retrieval task. Extensive evaluation results demonstrate that the learned features are robust and have strong generalizability across different tasks.