3D感知的基于样式的神经辐射场的编码

论文标题

3D感知的基于样式的神经辐射场的编码

3D-Aware Encoding for Style-based Neural Radiance Fields

论文作者

Li, Yu-Jhe, Xu, Tao, Wu, Bichen, Zheng, Ningyuan, Dai, Xiaoliang, Pumarola, Albert, Zhang, Peizhao, Vajda, Peter, Kitani, Kris

论文摘要

我们解决了基于样式的神经辐射场（例如Stylenerf）的NERF反转任务。在任务中，我们旨在学习一个反转函数，以将输入图像投影到NERF发电机的潜在空间，然后根据潜在代码合成原始图像的新颖视图。与2D生成模型的GAN反演相比，NERF反演不仅需要1）保留输入图像的身份，而且还需要2）确保生成的新型视图中的3D一致性。这需要从单视图像获得的潜在代码在多个视图上是不变的。为了应对这一新挑战，我们为基于样式的NERF反演提出了一个两阶段的编码器。在第一阶段，我们引入了一个基本编码器，将输入图像转换为潜在代码。为了确保潜在代码不变，并且能够合成3D一致的新型视图图像，我们利用身份对比学习来训练基本编码器。其次，为了更好地保留输入图像的身份，我们介绍了一个精炼的编码器，以完善潜在代码并在输出图像中添加更细节。重要的是要注意，该模型的新颖性在于其第一阶段编码器的设计，该编码器会产生位于潜在歧管上的最接近的潜在代码，因此在第二阶段的完善将与NERF多种多样。通过广泛的实验，我们证明了我们提出的两阶段编码器在定性上和定量地表现出优于现有编码器的优越性，以呈现图像重建和新颖视图渲染中的反转。

We tackle the task of NeRF inversion for style-based neural radiance fields, (e.g., StyleNeRF). In the task, we aim to learn an inversion function to project an input image to the latent space of a NeRF generator and then synthesize novel views of the original image based on the latent code. Compared with GAN inversion for 2D generative models, NeRF inversion not only needs to 1) preserve the identity of the input image, but also 2) ensure 3D consistency in generated novel views. This requires the latent code obtained from the single-view image to be invariant across multiple views. To address this new challenge, we propose a two-stage encoder for style-based NeRF inversion. In the first stage, we introduce a base encoder that converts the input image to a latent code. To ensure the latent code is view-invariant and is able to synthesize 3D consistent novel view images, we utilize identity contrastive learning to train the base encoder. Second, to better preserve the identity of the input image, we introduce a refining encoder to refine the latent code and add finer details to the output image. Importantly note that the novelty of this model lies in the design of its first-stage encoder which produces the closest latent code lying on the latent manifold and thus the refinement in the second stage would be close to the NeRF manifold. Through extensive experiments, we demonstrate that our proposed two-stage encoder qualitatively and quantitatively exhibits superiority over the existing encoders for inversion in both image reconstruction and novel-view rendering.

下载PDF全文

下载文献需遵守相关版权规定

论文标题