神经3D场景重建与曼哈顿世界的假设

论文标题

神经3D场景重建与曼哈顿世界的假设

Neural 3D Scene Reconstruction with the Manhattan-world Assumption

论文作者

Guo, Haoyu, Peng, Sida, Lin, Haotong, Wang, Qianqian, Zhang, Guofeng, Bao, Hujun, Zhou, Xiaowei

论文摘要

本文解决了从多视图图像重建3D室内场景的挑战。以前的许多作品在纹理对象上显示出令人印象深刻的重建结果，但是它们在处理室内场景中常见的低纹理平面区域仍然很难。解决此问题的一种方法是将刨块的约束纳入基于多视角立体的方法中的深度图估计中，但是每个视图平面估计和深度优化都缺乏效率和多视图一致性。在这项工作中，我们表明平面约束可以方便地集成到最近的隐式神经表示方法中。具体而言，我们使用MLP网络表示签名的距离函数作为场景几何形状。基于曼哈顿世界的假设，采用平面限制来使2D语义分割网络预测的地板和壁区域中的几何形状正规化。为了解决不准确的分割，我们用另一个MLP编码了3D点的语义，并设计了一种新颖的损失，该损失共同优化了3D空间中的场景几何形状和语义。扫描仪和7个扫描数据集的实验表明，所提出的方法在3D重建质量上的差距大优于先前的方法。该代码可在https://zju3dv.github.io/manhattan_sdf上找到。

This paper addresses the challenge of reconstructing 3D indoor scenes from multi-view images. Many previous works have shown impressive reconstruction results on textured objects, but they still have difficulty in handling low-textured planar regions, which are common in indoor scenes. An approach to solving this issue is to incorporate planer constraints into the depth map estimation in multi-view stereo-based methods, but the per-view plane estimation and depth optimization lack both efficiency and multi-view consistency. In this work, we show that the planar constraints can be conveniently integrated into the recent implicit neural representation-based reconstruction methods. Specifically, we use an MLP network to represent the signed distance function as the scene geometry. Based on the Manhattan-world assumption, planar constraints are employed to regularize the geometry in floor and wall regions predicted by a 2D semantic segmentation network. To resolve the inaccurate segmentation, we encode the semantics of 3D points with another MLP and design a novel loss that jointly optimizes the scene geometry and semantics in 3D space. Experiments on ScanNet and 7-Scenes datasets show that the proposed method outperforms previous methods by a large margin on 3D reconstruction quality. The code is available at https://zju3dv.github.io/manhattan_sdf.

下载PDF全文

下载文献需遵守相关版权规定

论文标题