剪贴雕像：零发出的高保真性和自然语言的多样形状

论文标题

剪贴雕像：零发出的高保真性和自然语言的多样形状

CLIP-Sculptor: Zero-Shot Generation of High-Fidelity and Diverse Shapes from Natural Language

论文作者

Sanghi, Aditya, Fu, Rao, Liu, Vivian, Willis, Karl, Shayani, Hooman, Khasahmadi, Amir Hosein, Sridhar, Srinath, Ritchie, Daniel

论文摘要

最近的作品表明，自然语言可用于生成和编辑3D形状。但是，这些方法会产生忠诚度和多样性有限的形状。我们介绍了剪贴画，这是一种通过产生高保真性和不同的3D形状来解决这些约束的方法，而无需在训练过程中（文本，形状）对（文本，形状）。剪贴式雕刻家以多分辨率的方法来实现这一目标，该方法首先在低维的潜在空间中生成，然后在更高的分辨率上升级，以提高形状的保真度。为了提高形状多样性，我们使用一个离散的潜在空间，该空间是使用剪贴画的图像式嵌入空间进行的变压器建模的。我们还提出了无分类器指导的新型变体，从而改善了准确性多样性权衡。最后，我们进行了广泛的实验，表明剪贴画家的表现优于最先进的基准。该代码可在https://ivl.cs.brown.edu/#/projects/clip-sculptor上找到。

Recent works have demonstrated that natural language can be used to generate and edit 3D shapes. However, these methods generate shapes with limited fidelity and diversity. We introduce CLIP-Sculptor, a method to address these constraints by producing high-fidelity and diverse 3D shapes without the need for (text, shape) pairs during training. CLIP-Sculptor achieves this in a multi-resolution approach that first generates in a low-dimensional latent space and then upscales to a higher resolution for improved shape fidelity. For improved shape diversity, we use a discrete latent space which is modeled using a transformer conditioned on CLIP's image-text embedding space. We also present a novel variant of classifier-free guidance, which improves the accuracy-diversity trade-off. Finally, we perform extensive experiments demonstrating that CLIP-Sculptor outperforms state-of-the-art baselines. The code is available at https://ivl.cs.brown.edu/#/projects/clip-sculptor.

下载PDF全文

下载文献需遵守相关版权规定

论文标题