NERF：将场景表示为视图合成的神经辐射场

论文标题

NERF：将场景表示为视图合成的神经辐射场

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

论文作者

Mildenhall, Ben, Srinivasan, Pratul P., Tancik, Matthew, Barron, Jonathan T., Ramamoorthi, Ravi, Ng, Ren

论文摘要

我们提出了一种方法，该方法通过使用一组稀疏的输入视图来优化基本的连续体积场景函数来实现综合复杂场景的新型视图的最新结果。我们的算法代表了使用完全连接（非交叉卷积）深网的场景，其输入是一个连续的5D坐标（空间位置$（x，x，y，z）$，并查看方向$（θ，ϕ）$），其输出是体积密度和视图依赖于该空间位置的辐射位置。我们通过查询沿相机射线的5D坐标并使用经典的音量渲染技术来综合视图，将输出颜色和密度投影到图像中。由于音量渲染是自然可区分的，因此优化我们表示形式所需的唯一输入是一组具有已知相机姿势的图像。我们描述了如何有效地优化神经辐射场，以使具有复杂几何形状和外观的场景的情相性新颖的视野，并证明结果表现出在神经渲染和观察合成方面优于先前工作的结果。查看合成结果最好被视为视频，因此我们敦促读者观看我们的补充视频，以说服比较。

We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. Our algorithm represents a scene using a fully-connected (non-convolutional) deep network, whose input is a single continuous 5D coordinate (spatial location $(x,y,z)$ and viewing direction $(θ, ϕ)$) and whose output is the volume density and view-dependent emitted radiance at that spatial location. We synthesize views by querying 5D coordinates along camera rays and use classic volume rendering techniques to project the output colors and densities into an image. Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses. We describe how to effectively optimize neural radiance fields to render photorealistic novel views of scenes with complicated geometry and appearance, and demonstrate results that outperform prior work on neural rendering and view synthesis. View synthesis results are best viewed as videos, so we urge readers to view our supplementary video for convincing comparisons.

下载PDF全文

下载文献需遵守相关版权规定

论文标题