文本指导的神经图像

论文标题

文本指导的神经图像

Text-Guided Neural Image Inpainting

论文作者

Zhang, Lisai, Chen, Qingcai, Hu, Baotian, Jiang, Shuoran

论文摘要

图像介入任务需要用与上下文相干的内容填充损坏的图像。该研究领域通过使用神经图像介绍方法实现了有希望的进步。然而，仅使用上下文像素来猜测错过的内容仍然存在一个关键的挑战。本文的目的是根据提供的描述性文本填充损坏的图像中的语义信息。从现有文本引导的图像生成工作中唯一，需要使用介入模型来比较给定文本的语义内容和图像的其余部分，然后找出应填写的语义内容，以填充以丢失零件。为了完成这样的任务，我们提出了一个名为文本引导的双重注意网络（TDANET）的新颖介绍模型。首先，双重模式注意机制旨在提取有关损坏区域的明确语义信息，这是通过通过相互关注比较描述性文本和互补图像领域来完成的。其次，应用图像文本匹配损失以最大程度地提高生成图像和文本的语义相似性。实验是在两个开放数据集上进行的。结果表明，拟议的TDANET模型在定量和定性措施方面都达到了新的最先进。结果分析表明，生成的图像与指导文本一致，从而通过提供不同的描述来产生各种结果。代码可在https://github.com/idealwhite/tdanet上找到

Image inpainting task requires filling the corrupted image with contents coherent with the context. This research field has achieved promising progress by using neural image inpainting methods. Nevertheless, there is still a critical challenge in guessing the missed content with only the context pixels. The goal of this paper is to fill the semantic information in corrupted images according to the provided descriptive text. Unique from existing text-guided image generation works, the inpainting models are required to compare the semantic content of the given text and the remaining part of the image, then find out the semantic content that should be filled for missing part. To fulfill such a task, we propose a novel inpainting model named Text-Guided Dual Attention Inpainting Network (TDANet). Firstly, a dual multimodal attention mechanism is designed to extract the explicit semantic information about the corrupted regions, which is done by comparing the descriptive text and complementary image areas through reciprocal attention. Secondly, an image-text matching loss is applied to maximize the semantic similarity of the generated image and the text. Experiments are conducted on two open datasets. Results show that the proposed TDANet model reaches new state-of-the-art on both quantitative and qualitative measures. Result analysis suggests that the generated images are consistent with the guidance text, enabling the generation of various results by providing different descriptions. Codes are available at https://github.com/idealwhite/TDANet

下载PDF全文

下载文献需遵守相关版权规定

论文标题