神经稀疏体素场

论文标题

神经稀疏体素场

Neural Sparse Voxel Fields

论文作者

Liu, Lingjie, Gu, Jiatao, Lin, Kyaw Zaw, Chua, Tat-Seng, Theobalt, Christian

论文摘要

使用经典的计算机图形技术对现实世界场景的照片真实的免费视图渲染具有挑战性，因为它需要捕获详细的外观和几何模型的困难步骤。最近的研究通过学习场景表示，在没有3D监督的情况下同时编码几何和外观，这表明了有希望的结果。但是，实践中现有的方法通常表明由于网络容量有限或难以找到相机光线与场景几何形状的准确交叉点而引起的模糊效果图。从这些表示形式中综合高分辨率图像通常需要耗时的光射线游行。在这项工作中，我们引入了神经稀疏体素场（NSVF），这是一种新的神经场景表示，用于快速和高质量的自由视图渲染。 NSVF定义了在稀疏体素OCTREE中组织的一组体素结合的隐式场，以模拟每个单元格中的局部特性。我们逐步学习了仅从一组摆姿势的RGB图像的射线建设操作的基础体素结构。凭借稀疏的体素Octree结构，可以通过跳过没有相关场景内容的体素来加速渲染新颖的视野。我们的方法通常比在推理时期的最先进的速度（即Nerf（Mildenhall等，2020））快10倍以上。此外，通过利用明确的稀疏体素表示，我们的方法可以轻松地应用于场景编辑和场景组成。我们还展示了几项具有挑战性的任务，包括多场习惯学习，移动人类的免费观看点以及大规模的场景渲染。代码和数据可在我们的网站上找到：https：//github.com/facebookresearch/nsvf。

Photo-realistic free-viewpoint rendering of real-world scenes using classical computer graphics techniques is challenging, because it requires the difficult step of capturing detailed appearance and geometry models. Recent studies have demonstrated promising results by learning scene representations that implicitly encode both geometry and appearance without 3D supervision. However, existing approaches in practice often show blurry renderings caused by the limited network capacity or the difficulty in finding accurate intersections of camera rays with the scene geometry. Synthesizing high-resolution imagery from these representations often requires time-consuming optical ray marching. In this work, we introduce Neural Sparse Voxel Fields (NSVF), a new neural scene representation for fast and high-quality free-viewpoint rendering. NSVF defines a set of voxel-bounded implicit fields organized in a sparse voxel octree to model local properties in each cell. We progressively learn the underlying voxel structures with a differentiable ray-marching operation from only a set of posed RGB images. With the sparse voxel octree structure, rendering novel views can be accelerated by skipping the voxels containing no relevant scene content. Our method is typically over 10 times faster than the state-of-the-art (namely, NeRF(Mildenhall et al., 2020)) at inference time while achieving higher quality results. Furthermore, by utilizing an explicit sparse voxel representation, our method can easily be applied to scene editing and scene composition. We also demonstrate several challenging tasks, including multi-scene learning, free-viewpoint rendering of a moving human, and large-scale scene rendering. Code and data are available at our website: https://github.com/facebookresearch/NSVF.

下载PDF全文

下载文献需遵守相关版权规定

论文标题