激活图像超分辨率变压器中的更多像素

论文标题

激活图像超分辨率变压器中的更多像素

Activating More Pixels in Image Super-Resolution Transformer

论文作者

Chen, Xiangyu, Wang, Xintao, Zhou, Jiantao, Qiao, Yu, Dong, Chao

论文摘要

基于变压器的方法在低级视觉任务中显示出令人印象深刻的性能，例如图像超分辨率。但是，我们发现这些网络只能通过归因分析利用有限的空间输入信息范围。这意味着在现有网络中仍未完全利用变压器的潜力。为了激活更多的输入像素以更好地重建，我们提出了一种新型的混合注意力变压器（HAT）。它结合了渠道注意力和基于窗口的自我发场方案，从而利用了能够利用全球统计数据和强大本地拟合能力的补充优势。此外，为了更好地汇总交叉窗口信息，我们引入了一个重叠的交叉意见模块，以增强相邻窗口功能之间的相互作用。在训练阶段，我们还采用了相同的任务预训练策略来利用模型的潜力以进一步改进。广泛的实验显示了所提出的模块的有效性，我们进一步扩展了该模型，以证明该任务的性能可以大大提高。我们的整体方法大大优于最先进的方法，超过1dB。代码和模型可在https://github.com/xpixelgroup/hat上找到。

Transformer-based methods have shown impressive performance in low-level vision tasks, such as image super-resolution. However, we find that these networks can only utilize a limited spatial range of input information through attribution analysis. This implies that the potential of Transformer is still not fully exploited in existing networks. In order to activate more input pixels for better reconstruction, we propose a novel Hybrid Attention Transformer (HAT). It combines both channel attention and window-based self-attention schemes, thus making use of their complementary advantages of being able to utilize global statistics and strong local fitting capability. Moreover, to better aggregate the cross-window information, we introduce an overlapping cross-attention module to enhance the interaction between neighboring window features. In the training stage, we additionally adopt a same-task pre-training strategy to exploit the potential of the model for further improvement. Extensive experiments show the effectiveness of the proposed modules, and we further scale up the model to demonstrate that the performance of this task can be greatly improved. Our overall method significantly outperforms the state-of-the-art methods by more than 1dB. Codes and models are available at https://github.com/XPixelGroup/HAT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题