视频脱张的流引导稀疏变压器

论文标题

视频脱张的流引导稀疏变压器

Flow-Guided Sparse Transformer for Video Deblurring

论文作者

Lin, Jing, Cai, Yuanhao, Hu, Xiaowan, Wang, Haoqian, Yan, Youliang, Zou, Xueyi, Ding, Henghui, Zhang, Yulun, Timofte, Radu, Van Gool, Luc

论文摘要

在时空社区中利用相似和更清晰的场景补丁对于视频DeBlurring至关重要。但是，基于CNN的方法显示了捕获长期依赖性和建模非本地自相似性的局限性。在本文中，我们提出了一个新颖的框架，流动引导的稀疏变压器（FGST），用于视频浮肿。在FGST中，我们自定义一个自我发挥的模块，基于流动的稀疏窗口多头自我注意力（FGSW-MSA）。对于模糊参考框架上的每个$ QUERY $元素，FGSW-MSA享受估计的光流向全球样本的空间稀疏样品但高度相关的$键$元素，与相邻框架中的同一场景补丁相对应。此外，我们提出了一种经常出现的嵌入（RE）机制，以从过去的框架中传输信息并加强远程时间依赖性。全面的实验表明，我们提出的FGST在DVD和GOPRO数据集上均优于最先进的方法（SOTA）方法，甚至在真实的视频DeBlurring中产生更令人愉悦的结果。代码和预培训模型可在https://github.com/linjing7/vr-baseline上公开获得

Exploiting similar and sharper scene patches in spatio-temporal neighborhoods is critical for video deblurring. However, CNN-based methods show limitations in capturing long-range dependencies and modeling non-local self-similarity. In this paper, we propose a novel framework, Flow-Guided Sparse Transformer (FGST), for video deblurring. In FGST, we customize a self-attention module, Flow-Guided Sparse Window-based Multi-head Self-Attention (FGSW-MSA). For each $query$ element on the blurry reference frame, FGSW-MSA enjoys the guidance of the estimated optical flow to globally sample spatially sparse yet highly related $key$ elements corresponding to the same scene patch in neighboring frames. Besides, we present a Recurrent Embedding (RE) mechanism to transfer information from past frames and strengthen long-range temporal dependencies. Comprehensive experiments demonstrate that our proposed FGST outperforms state-of-the-art (SOTA) methods on both DVD and GOPRO datasets and even yields more visually pleasing results in real video deblurring. Code and pre-trained models are publicly available at https://github.com/linjing7/VR-Baseline

下载PDF全文

下载文献需遵守相关版权规定

论文标题