Mask2CAD：通过学习细分和检索3D形状预测

论文标题

Mask2CAD：通过学习细分和检索3D形状预测

Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve

论文作者

Kuo, Weicheng, Angelova, Anelia, Lin, Tsung-Yi, Dai, Angela

论文摘要

对象识别在图像域中取得了重大进展，主要关注2D感知。我们建议利用3D模型的现有大规模数据集通过构造基于CAD的对象的表示及其姿势来理解图像中对象的基础3D结构。我们提出Mask2CAD，它共同检测现实世界图像中的对象和每个检测到的对象，可针对最相似的CAD模型及其姿势进行优化。我们在与对象和3D CAD模型相对应的图像的检测区域之间构建了一个联合嵌入空间，从而为输入RGB图像的CAD模型检索。这会产生图像中对象的干净，轻巧的表示；这种基于CAD的表示可确保针对内容创建或交互式场景等应用程序的有效，有效的形状表示形式，并朝着理解现实世界图像转换为合成域的转换迈出了一步。来自Pix3d的现实世界图像的实验证明了与艺术状态相比，我们的方法的优势。为了促进未来的研究，我们还提出了扫描仪上的新图像到3D基线，该基线具有更大的形状多样性，现实世界的遮挡和挑战性的图像视图。

Object recognition has seen significant progress in the image domain, with focus primarily on 2D perception. We propose to leverage existing large-scale datasets of 3D models to understand the underlying 3D structure of objects seen in an image by constructing a CAD-based representation of the objects and their poses. We present Mask2CAD, which jointly detects objects in real-world images and for each detected object, optimizes for the most similar CAD model and its pose. We construct a joint embedding space between the detected regions of an image corresponding to an object and 3D CAD models, enabling retrieval of CAD models for an input RGB image. This produces a clean, lightweight representation of the objects in an image; this CAD-based representation ensures a valid, efficient shape representation for applications such as content creation or interactive scenarios, and makes a step towards understanding the transformation of real-world imagery to a synthetic domain. Experiments on real-world images from Pix3D demonstrate the advantage of our approach in comparison to state of the art. To facilitate future research, we additionally propose a new image-to-3D baseline on ScanNet which features larger shape diversity, real-world occlusions, and challenging image views.

下载PDF全文

下载文献需遵守相关版权规定

论文标题