通过从扩散模型中学习嵌入文本的极端生成图像压缩

论文标题

通过从扩散模型中学习嵌入文本的极端生成图像压缩

Extreme Generative Image Compression by Learning Text Embedding from Diffusion Models

论文作者

Pan, Zhihong, Zhou, Xin, Tian, Hao

论文摘要

在有限的带宽上转移大量的高分辨率图像是一项重要但非常具有挑战性的任务。已经研究了使用极低的比特量（<0.1 BPP）的压缩图像，但由于可用于压缩数据的钻头数量的巨大限制，它通常会导致较低的沉重伪像的图像。通常说一张图片值得一千个单词，但另一方面，语言在使用简短描述捕获图像的本质方面非常有力。随着扩散模型在文本到图像生成中的最新成功，我们提出了一种生成图像压缩方法，该方法证明了将图像保存为短文本嵌入的潜力，又可以用来生成高保真图像，该图像等于原始的图像。对于给定的图像，使用与文本对图像扩散模型本身相同的优化过程来学习其相应的文本嵌入，并使用可学习的文本嵌入作为输入后，绕过原始变压器后。将优化与学习压缩模型一起应用，以实现低比特率<0.1 BPP的极端压缩。根据我们的实验，通过一组全面的图像质量指标来衡量，我们的方法在感知质量和多样性方面都优于其他最先进的深度学习方法。

Transferring large amount of high resolution images over limited bandwidth is an important but very challenging task. Compressing images using extremely low bitrates (<0.1 bpp) has been studied but it often results in low quality images of heavy artifacts due to the strong constraint in the number of bits available for the compressed data. It is often said that a picture is worth a thousand words but on the other hand, language is very powerful in capturing the essence of an image using short descriptions. With the recent success of diffusion models for text-to-image generation, we propose a generative image compression method that demonstrates the potential of saving an image as a short text embedding which in turn can be used to generate high-fidelity images which is equivalent to the original one perceptually. For a given image, its corresponding text embedding is learned using the same optimization process as the text-to-image diffusion model itself, using a learnable text embedding as input after bypassing the original transformer. The optimization is applied together with a learning compression model to achieve extreme compression of low bitrates <0.1 bpp. Based on our experiments measured by a comprehensive set of image quality metrics, our method outperforms the other state-of-the-art deep learning methods in terms of both perceptual quality and diversity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题