使用学习的潜在GAN压缩的视频编码

论文标题

使用学习的潜在GAN压缩的视频编码

Video Coding Using Learned Latent GAN Compression

论文作者

Shukor, Mustafa, Damodaran, Bharath Bhushan, Yao, Xu, Hellier, Pierre

论文摘要

我们在本文中提出了一个新的面部视频压缩范式。我们利用诸如stylegan之类的gan的生成能力来表示和压缩视频，包括内部和间压缩。每个帧都在StyleGAN的潜在空间中倒置，从中可以从中学习最佳压缩。为此，使用归一化流量模型学习了差异潜在表示，在该模型中可以优化熵模型以用于图像编码。此外，我们提出了一种新的感知损失，比其他同行更有效。最后，在先前构造的潜在表示中还学习了用于视频间编码的熵模型。我们的方法（SGANC）很简单，训练的速度更快，并且与最新的编解码器（例如VTM，AV1和最近的深度学习技术）相比，为图像和视频编码提供了更好的结果。特别是，它在低比特速率下极大地最大程度地减少了感知失真。

We propose in this paper a new paradigm for facial video compression. We leverage the generative capacity of GANs such as StyleGAN to represent and compress a video, including intra and inter compression. Each frame is inverted in the latent space of StyleGAN, from which the optimal compression is learned. To do so, a diffeomorphic latent representation is learned using a normalizing flows model, where an entropy model can be optimized for image coding. In addition, we propose a new perceptual loss that is more efficient than other counterparts. Finally, an entropy model for video inter coding with residual is also learned in the previously constructed latent representation. Our method (SGANC) is simple, faster to train, and achieves better results for image and video coding compared to state-of-the-art codecs such as VTM, AV1, and recent deep learning techniques. In particular, it drastically minimizes perceptual distortion at low bit rates.

下载PDF全文

下载文献需遵守相关版权规定

论文标题