论文标题
Unite and Conquer:使用扩散模型的插件和播放多模式合成
Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion Models
论文作者
论文摘要
产生满足多个约束的照片在内容创建行业中发现了广泛的效用。完成此任务的一个关键障碍是需要配对数据,该数据由所有模式(即约束)及其相应的输出组成。此外,现有方法需要使用各种方式的配对数据进行重新培训,以引入新条件。本文提出了基于脱氧扩散概率模型(DDPM)的解决方案。我们选择扩散模型而不是其他生成模型的动机来自扩散模型的灵活内部结构。由于DDPM中的每个采样步骤均遵循高斯分布,因此我们表明存在一个封闭形式的解决方案,用于生成给定各种约束的图像。我们的方法可以团结多个在多个子任务上训练的扩散模型,并通过我们提出的采样策略征服联合任务。我们还引入了一个新颖的可靠性参数,该参数允许在单独采样时间期间使用跨各个数据集训练的不同现成的扩散模型,以将其引导到满足多个约束的所需结果。我们对各种标准多模式任务进行实验,以证明我们方法的有效性。更多详细信息可以在https://nithin-gk.github.io/projectpages/multidiff/index.html中找到
Generating photos satisfying multiple constraints find broad utility in the content creation industry. A key hurdle to accomplishing this task is the need for paired data consisting of all modalities (i.e., constraints) and their corresponding output. Moreover, existing methods need retraining using paired data across all modalities to introduce a new condition. This paper proposes a solution to this problem based on denoising diffusion probabilistic models (DDPMs). Our motivation for choosing diffusion models over other generative models comes from the flexible internal structure of diffusion models. Since each sampling step in the DDPM follows a Gaussian distribution, we show that there exists a closed-form solution for generating an image given various constraints. Our method can unite multiple diffusion models trained on multiple sub-tasks and conquer the combined task through our proposed sampling strategy. We also introduce a novel reliability parameter that allows using different off-the-shelf diffusion models trained across various datasets during sampling time alone to guide it to the desired outcome satisfying multiple constraints. We perform experiments on various standard multimodal tasks to demonstrate the effectiveness of our approach. More details can be found in https://nithin-gk.github.io/projectpages/Multidiff/index.html