论文标题

在全球范围内探究图像插图的质地和结构

Delving Globally into Texture and Structure for Image Inpainting

论文作者

Liu, Haipeng, Wang, Yang, Wang, Meng, Rui, Yong

论文摘要

图像介入取得了显着的进步和启发的丰富方法,其中关键瓶颈被确定为如何实现具有语义上掩盖区域的高频结构和低频纹理信息。为此,深层模型具有强大的优势来捕捉它们,但对当地空间区域受到限制。在本文中,我们在全球范围内深入研究纹理和构造信息,以便很好地捕获图像插入图像的语义。与被困在独立本地补丁上的现有艺术相反,每个补丁的纹理信息是从整个图像上所有其他补丁中重建的,以匹配填充的信息,特别是掩盖区域上的结构信息。与用于图像插入的像素级别内的当前仅解码器的变压器不同,我们的模型采用了与编码器和解码器配对的变压器管道。一方面,编码器通过自我发项模块捕获了图像跨图像的纹理语义相关性。另一方面,在掩盖区域上填充的贴片的解码器中,自适应贴片词汇是动态建立的。在此基础上,锚定在已知区域上的结构文本匹配的注意模块嫁给了这两个世界中最好的,以通过概率扩散过程进行渐进的介绍。我们的模型与时尚艺术是正交的,例如卷积神经网络(CNN),注意力和变压器模型,从纹理和结构信息的角度用于图像插入图像。基准上的广泛实验验证了其优越性。我们的代码可在https://github.com/htyjers/dgts-inpainting上找到。

Image inpainting has achieved remarkable progress and inspired abundant methods, where the critical bottleneck is identified as how to fulfill the high-frequency structure and low-frequency texture information on the masked regions with semantics. To this end, deep models exhibit powerful superiority to capture them, yet constrained on the local spatial regions. In this paper, we delve globally into texture and structure information to well capture the semantics for image inpainting. As opposed to the existing arts trapped on the independent local patches, the texture information of each patch is reconstructed from all other patches across the whole image, to match the coarsely filled information, specially the structure information over the masked regions. Unlike the current decoder-only transformer within the pixel level for image inpainting, our model adopts the transformer pipeline paired with both encoder and decoder. On one hand, the encoder captures the texture semantic correlations of all patches across image via self-attention module. On the other hand, an adaptive patch vocabulary is dynamically established in the decoder for the filled patches over the masked regions. Building on this, a structure-texture matching attention module anchored on the known regions comes up to marry the best of these two worlds for progressive inpainting via a probabilistic diffusion process. Our model is orthogonal to the fashionable arts, such as Convolutional Neural Networks (CNNs), Attention and Transformer model, from the perspective of texture and structure information for image inpainting. The extensive experiments over the benchmarks validate its superiority. Our code is available at https://github.com/htyjers/DGTS-Inpainting.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源