连续有条件视频预测的统一模型

论文标题

连续有条件视频预测的统一模型

A unified model for continuous conditional video prediction

论文作者

Ye, Xi, Bilodeau, Guillaume-Alexandre

论文摘要

不同的条件视频预测任务，例如视频未来框架预测和视频框架插值，通常通过与任务相关的模型来解决，即使它们具有许多共同的基本特征。此外，几乎所有条件视频预测模型只能实现离散预测。在本文中，我们提出了一个统一模型，该模型同时解决了这两个问题。我们表明，有条件的视频预测可以作为神经过程进行表述，该过程将输入时空坐标映射到给定上下文时空时空坐标和上下文像素值的目标像素值。具体而言，我们将坐标和上下文像素特征的隐式神经表示形式馈送到基于变压器的非自动性有条件视频预测模型中。我们特定于任务的模型在多个数据集上的视频未来框架预测和视频插值都优于以前的工作。重要的是，该模型能够以任意的高帧速率（即连续预测）进行插值或预测。我们的源代码可在\ url {https://npvp.github.io}上获得。

Different conditional video prediction tasks, like video future frame prediction and video frame interpolation, are normally solved by task-related models even though they share many common underlying characteristics. Furthermore, almost all conditional video prediction models can only achieve discrete prediction. In this paper, we propose a unified model that addresses these two issues at the same time. We show that conditional video prediction can be formulated as a neural process, which maps input spatio-temporal coordinates to target pixel values given context spatio-temporal coordinates and context pixel values. Specifically, we feed the implicit neural representation of coordinates and context pixel features into a Transformer-based non-autoregressive conditional video prediction model. Our task-specific models outperform previous work for video future frame prediction and video interpolation on multiple datasets. Importantly, the model is able to interpolate or predict with an arbitrary high frame rate, i.e., continuous prediction. Our source code is available at \url{https://npvp.github.io}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题