论文标题
扩散艺术还是数字伪造?在扩散模型中研究数据复制
Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models
论文作者
论文摘要
尖端的扩散模型可生产具有高质量和可定制性的图像,从而使其可用于商业艺术和图形设计目的。但是,扩散模型是创造独特的艺术品,还是直接从训练集中复制内容?在这项工作中,我们研究了图像检索框架,使我们能够将生成的图像与训练样本进行比较,并检测何时复制内容。将我们的框架应用于在包括牛津花,Celeb-A,Imagenet和Laion在内的多个数据集上训练的扩散模型,我们讨论了训练设置尺寸尺寸诸如内容复制的影响率之类的因素。我们还确定了扩散模型(包括流行稳定扩散模型)从其训练数据中公然复制的情况。
Cutting-edge diffusion models produce images with high quality and customizability, enabling them to be used for commercial art and graphic design purposes. But do diffusion models create unique works of art, or are they replicating content directly from their training sets? In this work, we study image retrieval frameworks that enable us to compare generated images with training samples and detect when content has been replicated. Applying our frameworks to diffusion models trained on multiple datasets including Oxford flowers, Celeb-A, ImageNet, and LAION, we discuss how factors such as training set size impact rates of content replication. We also identify cases where diffusion models, including the popular Stable Diffusion model, blatantly copy from their training data.