带有3D卷积变化网络的视频预测的对数可能正规化的kl差异

论文标题

带有3D卷积变化网络的视频预测的对数可能正规化的kl差异

A Log-likelihood Regularized KL Divergence for Video Prediction with A 3D Convolutional Variational Recurrent Network

论文作者

Razali, Haziq, Fernando, Basura

论文摘要

潜在变量模型的使用已证明是对序列建模概率分布的强大工具。在本文中，我们介绍了一个新的变分模型，该模型以两种方式扩展了复发网络，以完成视频框架预测的任务。首先，我们在所有模块内介绍3D卷积，包括用于未来帧预测的经过重复模型，在每个时间步中输入和输出一系列视频帧。这使我们能够更好地利用变异复发模型内的时空信息，从而使我们能够产生高质量的预测。其次，除了在变异模型中常用的KL差异之外，我们还通过引入最大似然估计来增强变异模型的潜在损失。这种简单的扩展在变化自动编码器损耗函数中起着更强的正规化器，使我们获得更好的结果和概括性。实验表明，我们的模型在几个基准上的现有视频预测方法优于需要更少的参数。

The use of latent variable models has shown to be a powerful tool for modeling probability distributions over sequences. In this paper, we introduce a new variational model that extends the recurrent network in two ways for the task of video frame prediction. First, we introduce 3D convolutions inside all modules including the recurrent model for future frame prediction, inputting and outputting a sequence of video frames at each timestep. This enables us to better exploit spatiotemporal information inside the variational recurrent model, allowing us to generate high-quality predictions. Second, we enhance the latent loss of the variational model by introducing a maximum likelihood estimate in addition to the KL divergence that is commonly used in variational models. This simple extension acts as a stronger regularizer in the variational autoencoder loss function and lets us obtain better results and generalizability. Experiments show that our model outperforms existing video prediction methods on several benchmarks while requiring fewer parameters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题