论文标题
Mask2CAD:通过学习细分和检索3D形状预测
Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve
论文作者
论文摘要
对象识别在图像域中取得了重大进展,主要关注2D感知。我们建议利用3D模型的现有大规模数据集通过构造基于CAD的对象的表示及其姿势来理解图像中对象的基础3D结构。我们提出Mask2CAD,它共同检测现实世界图像中的对象和每个检测到的对象,可针对最相似的CAD模型及其姿势进行优化。我们在与对象和3D CAD模型相对应的图像的检测区域之间构建了一个联合嵌入空间,从而为输入RGB图像的CAD模型检索。这会产生图像中对象的干净,轻巧的表示;这种基于CAD的表示可确保针对内容创建或交互式场景等应用程序的有效,有效的形状表示形式,并朝着理解现实世界图像转换为合成域的转换迈出了一步。来自Pix3d的现实世界图像的实验证明了与艺术状态相比,我们的方法的优势。为了促进未来的研究,我们还提出了扫描仪上的新图像到3D基线,该基线具有更大的形状多样性,现实世界的遮挡和挑战性的图像视图。
Object recognition has seen significant progress in the image domain, with focus primarily on 2D perception. We propose to leverage existing large-scale datasets of 3D models to understand the underlying 3D structure of objects seen in an image by constructing a CAD-based representation of the objects and their poses. We present Mask2CAD, which jointly detects objects in real-world images and for each detected object, optimizes for the most similar CAD model and its pose. We construct a joint embedding space between the detected regions of an image corresponding to an object and 3D CAD models, enabling retrieval of CAD models for an input RGB image. This produces a clean, lightweight representation of the objects in an image; this CAD-based representation ensures a valid, efficient shape representation for applications such as content creation or interactive scenarios, and makes a step towards understanding the transformation of real-world imagery to a synthetic domain. Experiments on real-world images from Pix3D demonstrate the advantage of our approach in comparison to state of the art. To facilitate future research, we additionally propose a new image-to-3D baseline on ScanNet which features larger shape diversity, real-world occlusions, and challenging image views.