前网：基于预测编码的下一框架视频预测

论文标题

前网：基于预测编码的下一框架视频预测

PreCNet: Next-Frame Video Prediction Based on Predictive Coding

论文作者

Straka, Zdenek, Svoboda, Tomas, Hoffmann, Matej

论文摘要

预测性编码目前是神经科学方面的高度影响力理论，在机器学习中尚未被广泛采用。在这项工作中，我们将Rao and Ballard（1999）的开创性模型转变为一个现代的深度学习框架，同时仍然对原始模式最大程度地忠于。我们提出的最终网络（PRECNET）在一个广泛使用的下一个帧视频预测基准上进行了测试，该基准由从汽车安装的相机中录制的城市环境中的图像组成，并实现了最新的性能。当较大的训练集（来自BDD100K的2M图像）时，所有措施的性能（MSE，PSNR，SSIM）得到了进一步的提高，这表明KITTI训练集的局限性。这项工作表明，仔细基于神经科学模型的体系结构，而没有明确针对手头的任务量身定制，可以表现出出色的性能。

Predictive coding, currently a highly influential theory in neuroscience, has not been widely adopted in machine learning yet. In this work, we transform the seminal model of Rao and Ballard (1999) into a modern deep learning framework while remaining maximally faithful to the original schema. The resulting network we propose (PreCNet) is tested on a widely used next frame video prediction benchmark, which consists of images from an urban environment recorded from a car-mounted camera, and achieves state-of-the-art performance. Performance on all measures (MSE, PSNR, SSIM) was further improved when a larger training set (2M images from BDD100k), pointing to the limitations of the KITTI training set. This work demonstrates that an architecture carefully based in a neuroscience model, without being explicitly tailored to the task at hand, can exhibit exceptional performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题