论文标题
具有潜在扩散模型的创意绘画
Creative Painting with Latent Diffusion Models
论文作者
论文摘要
近年来,艺术绘画取得了重大进展。使用自动编码器将原始图像与压缩潜在空间连接起来,并随着扩散的主干而增强的U-NET,潜在扩散模型(LDMS)已实现了稳定且高生育力图像的产生。在本文中,我们专注于在两个方向上增强当前LDM的创造性绘画能力,即使用Wikiart DataSet的文本条件扩展和模型再培训。通过文本条件扩展,用户的输入提示通过丰富的上下文知识扩展,以深入了解和解释提示。 Wikiart DataSet包含了800年来由1,000多名著名艺术家以丰富的风格和流派绘制的80k著名艺术品。通过再培训,我们能够要求这些艺术家在现代主题上绘制新颖和创意的绘画。与原始模型的直接比较表明,创造力和艺术性得到了丰富。
Artistic painting has achieved significant progress during recent years. Using an autoencoder to connect the original images with compressed latent spaces and a cross attention enhanced U-Net as the backbone of diffusion, latent diffusion models (LDMs) have achieved stable and high fertility image generation. In this paper, we focus on enhancing the creative painting ability of current LDMs in two directions, textual condition extension and model retraining with Wikiart dataset. Through textual condition extension, users' input prompts are expanded with rich contextual knowledge for deeper understanding and explaining the prompts. Wikiart dataset contains 80K famous artworks drawn during recent 400 years by more than 1,000 famous artists in rich styles and genres. Through the retraining, we are able to ask these artists to draw novel and creative painting on modern topics. Direct comparisons with the original model show that the creativity and artistry are enriched.