根据增强基于扩散的文本对图像生成增强的任意样式指南

论文标题

根据增强基于扩散的文本对图像生成增强的任意样式指南

Arbitrary Style Guidance for Enhanced Diffusion-Based Text-to-Image Generation

论文作者

Pan, Zhihong, Zhou, Xin, Tian, Hao

论文摘要

基于扩散的文本到图像生成模型（例如Glide和Dalle-2）最近在将复杂的文本输入转变为高质量和广泛多样性的图像方面取得了广泛的成功。特别是，它们在创建各种格式和样式的图形艺术方面非常有力。尽管当前的模型支持指定风格的格式，例如油画或铅笔绘图，但很难指定颜色分布和笔触等细颗粒样式，因为它们是根据给定文本输入从条件分布中随机挑选的。在这里，我们提出了一种新型样式指导方法，以使用由参考图像引导的任意样式支持生成图像。该生成方法不需要单独的样式传输模型来生成所需的样式，同时在文本输入控制的生成内容中保持图像质量。此外，可以在没有样式参考的情况下应用指导方法（表示为自我样式指导），以生成更多样化的样式的图像。全面的实验证明，所提出的方法在各种条件下保持强大且有效，包括各种图形艺术形式，图像内容类型和扩散模型。

Diffusion-based text-to-image generation models like GLIDE and DALLE-2 have gained wide success recently for their superior performance in turning complex text inputs into images of high quality and wide diversity. In particular, they are proven to be very powerful in creating graphic arts of various formats and styles. Although current models supported specifying style formats like oil painting or pencil drawing, fine-grained style features like color distributions and brush strokes are hard to specify as they are randomly picked from a conditional distribution based on the given text input. Here we propose a novel style guidance method to support generating images using arbitrary style guided by a reference image. The generation method does not require a separate style transfer model to generate desired styles while maintaining image quality in generated content as controlled by the text input. Additionally, the guidance method can be applied without a style reference, denoted as self style guidance, to generate images of more diverse styles. Comprehensive experiments prove that the proposed method remains robust and effective in a wide range of conditions, including diverse graphic art forms, image content types and diffusion models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题