Dreambooth：主题驱动一代的微调文本到图像扩散模型

论文标题

Dreambooth：主题驱动一代的微调文本到图像扩散模型

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

论文作者

Ruiz, Nataniel, Li, Yuanzhen, Jampani, Varun, Pritch, Yael, Rubinstein, Michael, Aberman, Kfir

论文摘要

大型文本对图像模型在AI的演变中取得了显着的飞跃，从而使图像从给定的文本提示中实现了高质量和多样化的图像合成。但是，这些模型缺乏模仿给定参考集中受试者的外观，并在不同情况下综合了它们的新颖性。在这项工作中，我们提出了一种新的方法，以“个性化”对图像扩散模型的“个性化”。仅作为一个主题的几个图像，我们将其微调一个验证的文本对图像模型，以便它学会将唯一标识符与该特定主题绑定在一起。一旦将受试者嵌入模型的输出域中，唯一标识符可用于合成在不同场景中上下文对主题的新型感性图像。通过利用具有新的自动群特异性的先前保存损失的语义嵌入在模型中，我们的技术可以在不同的场景，姿势，视图和照明条件下综合主题，而这些主题未出现在参考图像中。我们将技术应用于几个以前无用的任务，包括主题重新定义，文本引导的视图合成和艺术渲染，同时保留了对象的关键特征。我们还为这项主题驱动生成的新任务提供了新的数据集和评估协议。项目页面：https：//dreambooth.github.io/

Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-quality and diverse synthesis of images from a given text prompt. However, these models lack the ability to mimic the appearance of subjects in a given reference set and synthesize novel renditions of them in different contexts. In this work, we present a new approach for "personalization" of text-to-image diffusion models. Given as input just a few images of a subject, we fine-tune a pretrained text-to-image model such that it learns to bind a unique identifier with that specific subject. Once the subject is embedded in the output domain of the model, the unique identifier can be used to synthesize novel photorealistic images of the subject contextualized in different scenes. By leveraging the semantic prior embedded in the model with a new autogenous class-specific prior preservation loss, our technique enables synthesizing the subject in diverse scenes, poses, views and lighting conditions that do not appear in the reference images. We apply our technique to several previously-unassailable tasks, including subject recontextualization, text-guided view synthesis, and artistic rendering, all while preserving the subject's key features. We also provide a new dataset and evaluation protocol for this new task of subject-driven generation. Project page: https://dreambooth.github.io/

下载PDF全文

下载文献需遵守相关版权规定

论文标题