在没有3D监督的情况下学习3D对象形状和布局

论文标题

在没有3D监督的情况下学习3D对象形状和布局

Learning 3D Object Shape and Layout without 3D Supervision

论文作者

Gkioxari, Georgia, Ravi, Nikhila, Johnson, Justin

论文摘要

一个3D场景由一组对象组成，每个对象都具有形状和一个布局，使其在太空中的位置。从2D图像中了解3D场景是一个重要的目标，并具有机器人技术和图形的应用。尽管最近在预测单个图像的3D形状和布局方面取得了进步，但大多数方法都依赖于3D地面真相来进行训练，这很昂贵。我们克服了这些局限性，并提出了一种方法，该方法学会预测没有任何地面真相形状或布局信息的对象的3D形状和布局：相反，我们依靠具有2D监督的多视图图像，可以更轻松地按大规模收集。通过对3D仓库，Hypersim和扫描仪的广泛实验，我们证明了我们的进近量表与逼真的图像的大型数据集相比，并与依赖3D地面真相的方法进行了比较。在Hypersim和Scannet上，没有可靠的3D地面真相，我们的方法的表现优于在较小且较少的数据集中训练的监督方法。

A 3D scene consists of a set of objects, each with a shape and a layout giving their position in space. Understanding 3D scenes from 2D images is an important goal, with applications in robotics and graphics. While there have been recent advances in predicting 3D shape and layout from a single image, most approaches rely on 3D ground truth for training which is expensive to collect at scale. We overcome these limitations and propose a method that learns to predict 3D shape and layout for objects without any ground truth shape or layout information: instead we rely on multi-view images with 2D supervision which can more easily be collected at scale. Through extensive experiments on 3D Warehouse, Hypersim, and ScanNet we demonstrate that our approach scales to large datasets of realistic images, and compares favorably to methods relying on 3D ground truth. On Hypersim and ScanNet where reliable 3D ground truth is not available, our approach outperforms supervised approaches trained on smaller and less diverse datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题