论文标题
学习先前的功能和注意力增强图像介绍
Learning Prior Feature and Attention Enhanced Image Inpainting
论文作者
论文摘要
通过利用深层神经网络(DNN)来建模各种先前的信息以恢复图像,许多最近的介绍作品都取得了令人印象深刻的结果。不幸的是,这些方法的性能在很大程度上受到香草卷积神经网络(CNNS)骨架的表示能力的限制。另一方面,具有自我监督的预训练的视觉变压器(VIT)显示出许多视觉识别和对象检测任务的巨大潜力。一个自然的问题是,VIT主干是否可以大大受益?但是,直接替换在介入网络中的新骨干是不是很普遍的,因为indpainting与识别任务根本不同。为此,本文将基于训练的蒙面自动编码器(MAE)结合到了indpaining模型中,该模型具有更丰富的信息学先验,以增强涂漆过程。此外,我们建议使用MAE的注意力学先验,以使介绍模型学习掩盖区域和未掩盖区域之间更多的长距离依赖关系。在本文中,已经讨论了有关介绍和自我监管的预训练模型的足够消融。此外,对Ploce2和FFHQ的实验证明了我们提出的模型的有效性。代码和预训练模型在https://github.com/ewrfcas/mae-far中发布。
Many recent inpainting works have achieved impressive results by leveraging Deep Neural Networks (DNNs) to model various prior information for image restoration. Unfortunately, the performance of these methods is largely limited by the representation ability of vanilla Convolutional Neural Networks (CNNs) backbones.On the other hand, Vision Transformers (ViT) with self-supervised pre-training have shown great potential for many visual recognition and object detection tasks. A natural question is whether the inpainting task can be greatly benefited from the ViT backbone? However, it is nontrivial to directly replace the new backbones in inpainting networks, as the inpainting is an inverse problem fundamentally different from the recognition tasks. To this end, this paper incorporates the pre-training based Masked AutoEncoder (MAE) into the inpainting model, which enjoys richer informative priors to enhance the inpainting process. Moreover, we propose to use attention priors from MAE to make the inpainting model learn more long-distance dependencies between masked and unmasked regions. Sufficient ablations have been discussed about the inpainting and the self-supervised pre-training models in this paper. Besides, experiments on both Places2 and FFHQ demonstrate the effectiveness of our proposed model. Codes and pre-trained models are released in https://github.com/ewrfcas/MAE-FAR.