Partslip：通过验证的图像语言模型的3D点云的低发零件分割

论文标题

Partslip：通过验证的图像语言模型的3D点云的低发零件分割

PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained Image-Language Models

论文作者

Liu, Minghua, Zhu, Yinhao, Cai, Hong, Han, Shizhong, Ling, Zhan, Porikli, Fatih, Su, Hao

论文摘要

可概括的3D部分细分很重要，但在视觉和机器人技术中具有挑战性。通过常规监督方法培训深层模型需要大规模的3D数据集，具有细粒度的零件注释，这是昂贵的。本文通过利用验证的图像语言模型GLIP探讨了3D点云的低射击部分分割的替代方法，该模型在开放式摄氏2D检测上实现了卓越的性能。我们通过对点云渲染的基于GLIP的零件检测和一种新颖的2d到3D标签提升算法将丰富的知识从2D转移到3D。我们还利用多视图3D先验和很少的弹性调整来显着提高性能。对Partnet和Partnet-Mobility数据集进行了广泛的评估表明，我们的方法可以实现出色的零射击3D零件分割。与完全监督的对手相比，我们的几杆版本不仅要超过现有的几杆方法，而且还取得了高度竞争的结果。此外，我们证明我们的方法可以直接应用于没有明显域间隙的iPhone扫描点云。

Generalizable 3D part segmentation is important but challenging in vision and robotics. Training deep models via conventional supervised methods requires large-scale 3D datasets with fine-grained part annotations, which are costly to collect. This paper explores an alternative way for low-shot part segmentation of 3D point clouds by leveraging a pretrained image-language model, GLIP, which achieves superior performance on open-vocabulary 2D detection. We transfer the rich knowledge from 2D to 3D through GLIP-based part detection on point cloud rendering and a novel 2D-to-3D label lifting algorithm. We also utilize multi-view 3D priors and few-shot prompt tuning to boost performance significantly. Extensive evaluation on PartNet and PartNet-Mobility datasets shows that our method enables excellent zero-shot 3D part segmentation. Our few-shot version not only outperforms existing few-shot approaches by a large margin but also achieves highly competitive results compared to the fully supervised counterpart. Furthermore, we demonstrate that our method can be directly applied to iPhone-scanned point clouds without significant domain gaps.

下载PDF全文

下载文献需遵守相关版权规定

论文标题