通过视频框架插值优化视频预测

论文标题

通过视频框架插值优化视频预测

Optimizing Video Prediction via Video Frame Interpolation

论文作者

Wu, Yue, Wen, Qiang, Chen, Qifeng

论文摘要

视频预测是一项外推任务，可以预测给定过去帧的未来帧，而视频框架插值是一个插值任务，可以估算两个帧之间的中间帧。我们目睹了视频框架插值的巨大进步，但野外的一般视频预测仍然是一个悬而未决的问题。受视频框架插值的照片真实结果的启发，我们为视频框架插值提供了一个新的优化框架，用于视频预测，在该框架中，我们根据插值模型解决了推断问题。我们的视频预测框架是基于优化的，而无需训练数据集，而无需训练数据集，培训数据和测试数据之间没有域间隙问题。另外，我们的方法不需要任何其他信息，例如语义或实例地图，这使我们的框架适用于任何视频。关于CityScapes，Kitti，Davis，Middlebury和Vimeo90K数据集的广泛实验表明，在一般情况下，我们的视频预测结果非常强大，我们的方法优于其他需要大量培训数据或额外语义信息的视频预测方法。

Video prediction is an extrapolation task that predicts future frames given past frames, and video frame interpolation is an interpolation task that estimates intermediate frames between two frames. We have witnessed the tremendous advancement of video frame interpolation, but the general video prediction in the wild is still an open question. Inspired by the photo-realistic results of video frame interpolation, we present a new optimization framework for video prediction via video frame interpolation, in which we solve an extrapolation problem based on an interpolation model. Our video prediction framework is based on optimization with a pretrained differentiable video frame interpolation module without the need for a training dataset, and thus there is no domain gap issue between training and test data. Also, our approach does not need any additional information such as semantic or instance maps, which makes our framework applicable to any video. Extensive experiments on the Cityscapes, KITTI, DAVIS, Middlebury, and Vimeo90K datasets show that our video prediction results are robust in general scenarios, and our approach outperforms other video prediction methods that require a large amount of training data or extra semantic information.

下载PDF全文

下载文献需遵守相关版权规定

论文标题