DreamArtist ++：可控制的单发文本对图像生成，通过正阴系适配器

论文标题

DreamArtist ++：可控制的单发文本对图像生成，通过正阴系适配器

DreamArtist++: Controllable One-Shot Text-to-Image Generation via Positive-Negative Adapter

论文作者

Dong, Ziyi, Wei, Pengxu, Lin, Liang

论文摘要

最先进的文本对图像生成模型（例如成像剂和稳定的扩散模型）在合成以人类文本提示为指导的高质量，高分辨率的高质量，高分辨率的图像中取得了显着进步。由于图像内容的某些特征\ emph {例如，非常特定的对象实体或样式，很难通过文本准确地描述，因此已经提出了一些基于示例的图像生成方法，因此\ emph {i.e.}基于吸收一些输入参考的显着特征的吸收而生成新概念。尽管取得了公认的成功，但这些方法仍在准确地捕获参考示例的特征，同时保持多样化和高质量的图像生成，尤其是在一次性场景中（\ emph {i.e。}，只有一个参考）。为了解决这个问题，我们提出了一个简单而有效的框架，即DreamArtist，该框架在预训练的扩散模型上采用了一种新颖的积极的迅速学习策略，它已经证明可以很好地处理仅使用一个参考示例的图像生成的准确可控性和忠诚度之间的权衡。具体而言，我们提出的框架包含正嵌入和负嵌入或适配器，并以联合方式优化它们。积极的部分积极地捕获了参考图像的显着特征，以驱动多样化的产生，而负部分则纠正了从积极部分的不足。我们已经进行了广泛的实验，并通过图像相似性（保真度）和多样性，发电可控性和样式克隆评估了所提出的方法。而且我们的梦幻艺术家（DreamArtist）取得了优于现有方法的卓越生成性能。此外，我们对扩展任务的额外评估，包括概念组成和及时指导的图像编辑，证明了其对更多应用的有效性。

State-of-the-arts text-to-image generation models such as Imagen and Stable Diffusion Model have succeed remarkable progresses in synthesizing high-quality, feature-rich images with high resolution guided by human text prompts. Since certain characteristics of image content \emph{e.g.}, very specific object entities or styles, are very hard to be accurately described by text, some example-based image generation approaches have been proposed, \emph{i.e.} generating new concepts based on absorbing the salient features of a few input references. Despite of acknowledged successes, these methods have struggled on accurately capturing the reference examples' characteristics while keeping diverse and high-quality image generation, particularly in the one-shot scenario (\emph{i.e.} given only one reference). To tackle this problem, we propose a simple yet effective framework, namely DreamArtist, which adopts a novel positive-negative prompt-tuning learning strategy on the pre-trained diffusion model, and it has shown to well handle the trade-off between the accurate controllability and fidelity of image generation with only one reference example. Specifically, our proposed framework incorporates both positive and negative embeddings or adapters and optimizes them in a joint manner. The positive part aggressively captures the salient characteristics of the reference image to drive diversified generation and the negative part rectifies inadequacies from the positive part. We have conducted extensive experiments and evaluated the proposed method from image similarity (fidelity) and diversity, generation controllability, and style cloning. And our DreamArtist has achieved a superior generation performance over existing methods. Besides, our additional evaluation on extended tasks, including concept compositions and prompt-guided image editing, demonstrates its effectiveness for more applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题