有条件扩散模型的有损图像压缩

论文标题

有条件扩散模型的有损图像压缩

Lossy Image Compression with Conditional Diffusion Models

论文作者

Yang, Ruihan, Mandt, Stephan

论文摘要

本文使用扩散生成模型概述了端到端优化的有损图像压缩框架。该方法依赖于转换编码范式，其中图像被映射到潜在空间进行熵编码，然后从那里映射回数据空间以进行重建。与基于VAE的神经压缩相反，（平均）解码器是确定性的神经网络，我们的解码器是条件扩散模型。因此，我们的方法引入了一个附加的``content''潜在变量，在该变量上进行了反向扩散过程的条件，并使用此变量来存储有关图像的信息。其余的``纹理''变量在解码时间合成了表征扩散过程的变量。我们表明，该模型的性能可以调整为感兴趣的感知指标。我们涉及多个数据集和图像质量评估指标的广泛实验表明，我们的方法比基于GAN的模型产生的FID得分更强，同时还可以在几个失真指标中使用基于VAE的模型产生竞争性能。此外，使用$ \ Mathcal {X} $ - 参数化训练扩散仅在少数解码步骤中可以实现高质量的重建，从而极大地影响了模型的实用性。我们的代码可在：\ url {https://github.com/buggyyang/cdc_compression}

This paper outlines an end-to-end optimized lossy image compression framework using diffusion generative models. The approach relies on the transform coding paradigm, where an image is mapped into a latent space for entropy coding and, from there, mapped back to the data space for reconstruction. In contrast to VAE-based neural compression, where the (mean) decoder is a deterministic neural network, our decoder is a conditional diffusion model. Our approach thus introduces an additional ``content'' latent variable on which the reverse diffusion process is conditioned and uses this variable to store information about the image. The remaining ``texture'' variables characterizing the diffusion process are synthesized at decoding time. We show that the model's performance can be tuned toward perceptual metrics of interest. Our extensive experiments involving multiple datasets and image quality assessment metrics show that our approach yields stronger reported FID scores than the GAN-based model, while also yielding competitive performance with VAE-based models in several distortion metrics. Furthermore, training the diffusion with $\mathcal{X}$-parameterization enables high-quality reconstructions in only a handful of decoding steps, greatly affecting the model's practicality. Our code is available at: \url{https://github.com/buggyyang/CDC_compression}

下载PDF全文

下载文献需遵守相关版权规定

论文标题