维度变化的扩散过程

论文标题

维度变化的扩散过程

Dimensionality-Varying Diffusion Process

论文作者

Zhang, Han, Feng, Ruili, Yang, Zhantao, Huang, Lianghua, Liu, Yu, Zhang, Yifei, Shen, Yujun, Zhao, Deli, Zhou, Jingren, Cheng, Fan

论文摘要

扩散模型，该模型学会逆转信号破坏过程以生成新数据，通常需要每个步骤的信号才能具有相同的维度。我们认为，考虑到图像信号中的空间冗余，无需在进化过程中维持高维度，尤其是在早期阶段。为此，我们通过信号分解对正向扩散过程进行理论上的概括。具体而言，我们设法将图像分解为多个正交组件，并在扰动图像时控制每个组件的衰减。这样，随着噪声强度的提高，我们能够减少这些无关紧要的组件，从而使用较低的信号来表示源，几乎没有丢失信息。这种重新制定允许在扩散模型的训练和推理中改变维度。与基线方法相比，在一系列数据集上进行了广泛的实验表明，我们的方法大大降低了计算成本并实现PAR甚至更好的合成性能。我们还表明，我们的策略促进了高分辨率图像合成，并改善了在FFHQ上以$ 1024 \ times1024 $分辨率从52.40到10.46训练的扩散模型的FID。代码和模型将公开可用。

Diffusion models, which learn to reverse a signal destruction process to generate new data, typically require the signal at each step to have the same dimension. We argue that, considering the spatial redundancy in image signals, there is no need to maintain a high dimensionality in the evolution process, especially in the early generation phase. To this end, we make a theoretical generalization of the forward diffusion process via signal decomposition. Concretely, we manage to decompose an image into multiple orthogonal components and control the attenuation of each component when perturbing the image. That way, along with the noise strength increasing, we are able to diminish those inconsequential components and thus use a lower-dimensional signal to represent the source, barely losing information. Such a reformulation allows to vary dimensions in both training and inference of diffusion models. Extensive experiments on a range of datasets suggest that our approach substantially reduces the computational cost and achieves on-par or even better synthesis performance compared to baseline methods. We also show that our strategy facilitates high-resolution image synthesis and improves FID of diffusion model trained on FFHQ at $1024\times1024$ resolution from 52.40 to 10.46. Code and models will be made publicly available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题