SDFusion：多模式3D形状完成，重建和一代

论文标题

SDFusion：多模式3D形状完成，重建和一代

SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation

论文作者

Cheng, Yen-Chi, Lee, Hsin-Ying, Tulyakov, Sergey, Schwing, Alexander, Gui, Liangyan

论文摘要

在这项工作中，我们介绍了一个新颖的框架，以简化业余用户的3D资产生成。为了启用交互式生成，我们的方法支持各种输入方式，这些方式可以由人类轻松提供，包括图像，文本，部分观察到这些形状和这些组合，进一步可以调整每个输入的强度。我们方法的核心是编码器描述器，将3D形状压缩成紧凑的潜在表示，并在其上学习了扩散模型。为了启用各种多模式输入，我们使用辍学的特定任务编码器，然后采用跨注意机制。由于其灵活性，我们的模型自然支持各种任务，优于形状完成，基于图像的3D重建和文本到3D的先前工作。最有趣的是，我们的模型可以将所有这些任务结合到一个瑞士武器刀工具中，使用户能够同时使用不完整的形状，图像和文本描述执行形状生成，从而为每个输入提供相对权重，并促进交互性。尽管我们的方法是仅形状的，但我们进一步展示了一种有效的方法，可以使用大规模的文本对图像模型纹理生成的形状。

In this work, we present a novel framework built to simplify 3D asset generation for amateur users. To enable interactive generation, our method supports a variety of input modalities that can be easily provided by a human, including images, text, partially observed shapes and combinations of these, further allowing to adjust the strength of each input. At the core of our approach is an encoder-decoder, compressing 3D shapes into a compact latent representation, upon which a diffusion model is learned. To enable a variety of multi-modal inputs, we employ task-specific encoders with dropout followed by a cross-attention mechanism. Due to its flexibility, our model naturally supports a variety of tasks, outperforming prior works on shape completion, image-based 3D reconstruction, and text-to-3D. Most interestingly, our model can combine all these tasks into one swiss-army-knife tool, enabling the user to perform shape generation using incomplete shapes, images, and textual descriptions at the same time, providing the relative weights for each input and facilitating interactivity. Despite our approach being shape-only, we further show an efficient method to texture the generated shape using large-scale text-to-image models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题