视觉艺术家创意作品的大规模文本到图像生成模型

论文标题

视觉艺术家创意作品的大规模文本到图像生成模型

Large-scale Text-to-Image Generation Models for Visual Artists' Creative Works

论文作者

Ko, Hyung-Kwon, Park, Gwanmo, Jeon, Hyeon, Jo, Jaemin, Kim, Juho, Seo, Jinwook

论文摘要

大规模的文本到图像生成模型（LTGMS）（例如，DALL-E），自制的深度学习模型，在巨大的数据集中训练有素，已经证明了从多模式输入中产生高质量开放域图像的能力。尽管他们甚至可以生产拟人化的物体和动物版本，以合理的方式结合不相关的概念，并为任何用户提供的图像做出变化，但我们目睹了如此快速的技术进步，使许多视觉艺术家在利用LTGM的创作作品方面迷失方向。我们在这项工作中的目标是了解视觉艺术家如何采用LTGM来支持其创意作品。为此，我们进行了一项访谈研究以及对72个系统/应用程序论文的系统文献综述，以进行彻底检查。共有28位覆盖35个不同视觉艺术领域的视觉艺术家承认LTGMS的多功能角色，具有高度可用性，以支持创造性作品在自动化创建过程（即自动化）方面，扩大了他们的想法（即探索），并促进或促进沟通或仲裁（即调解）。我们通过提供四个设计指南来结束，未来研究人员可以在使用LTGM的智能用户界面时提到这些指南。

Large-scale Text-to-image Generation Models (LTGMs) (e.g., DALL-E), self-supervised deep learning models trained on a huge dataset, have demonstrated the capacity for generating high-quality open-domain images from multi-modal input. Although they can even produce anthropomorphized versions of objects and animals, combine irrelevant concepts in reasonable ways, and give variation to any user-provided images, we witnessed such rapid technological advancement left many visual artists disoriented in leveraging LTGMs more actively in their creative works. Our goal in this work is to understand how visual artists would adopt LTGMs to support their creative works. To this end, we conducted an interview study as well as a systematic literature review of 72 system/application papers for a thorough examination. A total of 28 visual artists covering 35 distinct visual art domains acknowledged LTGMs' versatile roles with high usability to support creative works in automating the creation process (i.e., automation), expanding their ideas (i.e., exploration), and facilitating or arbitrating in communication (i.e., mediation). We conclude by providing four design guidelines that future researchers can refer to in making intelligent user interfaces using LTGMs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题