论文标题
利用多属性时尚图像操纵的现成的扩散模型
Leveraging Off-the-shelf Diffusion Model for Multi-attribute Fashion Image Manipulation
论文作者
论文摘要
时尚属性编辑是一项任务,旨在在保留无关区域的同时转换给定时尚形象的语义属性。以前的工作通常采用有条件的gan,其中生成器明确学习目标属性并直接执行转换。但是,这些方法既不是可扩展的也不是通用的,因为它们仅在少数有限的属性中运行,并且每个数据集或属性集都需要单独的生成器。受到扩散模型的最新进步的启发,我们探索了分类器引导的扩散,该扩散利用了在一般视觉语义(例如ImageNet)上鉴定的现成的扩散模型。为了获得通用的编辑管道,我们将其作为多属性图像操纵任务提出,其中属性范围从项目类别,织物,图案到项圈和领口。我们从经验上表明,常规方法在我们的具有挑战性的环境中失败,并且研究有效的适应方案,该方案最近引入了注意力吸收技术以获得多属性分类器指导。基于此,我们提出了一个无面具的时尚属性编辑框架,该框架利用分类器ligits和交叉注意地图进行操作。我们从经验上证明,我们的框架实现了令人信服的样本质量和属性一致性。
Fashion attribute editing is a task that aims to convert the semantic attributes of a given fashion image while preserving the irrelevant regions. Previous works typically employ conditional GANs where the generator explicitly learns the target attributes and directly execute the conversion. These approaches, however, are neither scalable nor generic as they operate only with few limited attributes and a separate generator is required for each dataset or attribute set. Inspired by the recent advancement of diffusion models, we explore the classifier-guided diffusion that leverages the off-the-shelf diffusion model pretrained on general visual semantics such as Imagenet. In order to achieve a generic editing pipeline, we pose this as multi-attribute image manipulation task, where the attribute ranges from item category, fabric, pattern to collar and neckline. We empirically show that conventional methods fail in our challenging setting, and study efficient adaptation scheme that involves recently introduced attention-pooling technique to obtain a multi-attribute classifier guidance. Based on this, we present a mask-free fashion attribute editing framework that leverages the classifier logits and the cross-attention map for manipulation. We empirically demonstrate that our framework achieves convincing sample quality and attribute alignments.