论文标题
剪贴雕像:零发出的高保真性和自然语言的多样形状
CLIP-Sculptor: Zero-Shot Generation of High-Fidelity and Diverse Shapes from Natural Language
论文作者
论文摘要
最近的作品表明,自然语言可用于生成和编辑3D形状。但是,这些方法会产生忠诚度和多样性有限的形状。我们介绍了剪贴画,这是一种通过产生高保真性和不同的3D形状来解决这些约束的方法,而无需在训练过程中(文本,形状)对(文本,形状)。剪贴式雕刻家以多分辨率的方法来实现这一目标,该方法首先在低维的潜在空间中生成,然后在更高的分辨率上升级,以提高形状的保真度。为了提高形状多样性,我们使用一个离散的潜在空间,该空间是使用剪贴画的图像式嵌入空间进行的变压器建模的。我们还提出了无分类器指导的新型变体,从而改善了准确性多样性权衡。最后,我们进行了广泛的实验,表明剪贴画家的表现优于最先进的基准。该代码可在https://ivl.cs.brown.edu/#/projects/clip-sculptor上找到。
Recent works have demonstrated that natural language can be used to generate and edit 3D shapes. However, these methods generate shapes with limited fidelity and diversity. We introduce CLIP-Sculptor, a method to address these constraints by producing high-fidelity and diverse 3D shapes without the need for (text, shape) pairs during training. CLIP-Sculptor achieves this in a multi-resolution approach that first generates in a low-dimensional latent space and then upscales to a higher resolution for improved shape fidelity. For improved shape diversity, we use a discrete latent space which is modeled using a transformer conditioned on CLIP's image-text embedding space. We also present a novel variant of classifier-free guidance, which improves the accuracy-diversity trade-off. Finally, we perform extensive experiments demonstrating that CLIP-Sculptor outperforms state-of-the-art baselines. The code is available at https://ivl.cs.brown.edu/#/projects/clip-sculptor.