IC3D：形状生成的图像条件的3D扩散

论文标题

IC3D：形状生成的图像条件的3D扩散

IC3D: Image-Conditioned 3D Diffusion for Shape Generation

论文作者

Sbrolli, Cristian, Cudrano, Paolo, Frosi, Matteo, Matteucci, Matteo

论文摘要

近年来，在各种2D生成任务中表现出非凡的扩散概率模型（DDPM）。成功之后，DDPM已扩展到3D形状的生成，超过了该域中的先前方法。尽管这些模型中的许多是无条件的，但一些模型探索了使用不同模式的指导的潜力。特别是，通过利用夹嵌入，已经探索了3D生成的图像指南。但是，这些嵌入旨在使图像和文本对齐，不一定捕获形状生成所需的特定细节。为了解决此限制并以增强的3D理解增强图像引导的3D DDPM，我们介绍了CISP（对比度图像形状的预训练），获得了结构良好的图像形状接头嵌入空间。然后，在CISP的基础上，我们介绍了IC3D，这是一个DDPM，它利用CISP从单视图图像中生成3D形状的指南。这种生成扩散模型在生成的3D形状的质量和多样性方面都优于现有基准。此外，尽管IC3D具有生成性质，但人类评估者比竞争性的单视3D重建模型更喜欢其生成的形状。这些属性有助于连贯的嵌入空间，从而使潜在的插值和条件产生也从分布式图像中产生。我们发现IC3D能够在带有遮挡的视图时也能产生相干和多样化的完成，从而使其适用于受控的现实世界情景。

In recent years, Denoising Diffusion Probabilistic Models (DDPMs) have demonstrated exceptional performance in various 2D generative tasks. Following this success, DDPMs have been extended to 3D shape generation, surpassing previous methodologies in this domain. While many of these models are unconditional, some have explored the potential of using guidance from different modalities. In particular, image guidance for 3D generation has been explored through the utilization of CLIP embeddings. However, these embeddings are designed to align images and text, and do not necessarily capture the specific details needed for shape generation. To address this limitation and enhance image-guided 3D DDPMs with augmented 3D understanding, we introduce CISP (Contrastive Image-Shape Pre-training), obtaining a well-structured image-shape joint embedding space. Building upon CISP, we then introduce IC3D, a DDPM that harnesses CISP's guidance for 3D shape generation from single-view images. This generative diffusion model outperforms existing benchmarks in both quality and diversity of generated 3D shapes. Moreover, despite IC3D's generative nature, its generated shapes are preferred by human evaluators over a competitive single-view 3D reconstruction model. These properties contribute to a coherent embedding space, enabling latent interpolation and conditioned generation also from out-of-distribution images. We find IC3D able to generate coherent and diverse completions also when presented with occluded views, rendering it applicable in controlled real-world scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题