草图值得一千个单词：带有文本和草图的图像检索

论文标题

草图值得一千个单词：带有文本和草图的图像检索

A Sketch Is Worth a Thousand Words: Image Retrieval with Text and Sketch

论文作者

Sangkloy, Patsorn, Jitkrittum, Wittawat, Yang, Diyi, Hays, James

论文摘要

我们解决了用草图和文本查询检索图像的问题。我们提出任务形成器（文本和草图变压器），这是一种可使用文本说明和草图作为输入的端到端训练模型。我们认为，两种输入方式都以一个单独的方式无法轻易实现的方式相互补充。任务形成器遵循延迟融合的双重编码方法，类似于剪辑，该方法允许有效且可扩展的检索，因为检索集可以独立于查询而索引。我们从经验上证明，与传统的基于文本的图像检索相比，除文本外，使用输入草图（甚至是绘制的草图）大大增加了检索召回。为了评估我们的方法，我们在可可数据集的测试集中收集了5,000个手绘草图。收集的草图可用https://janesjanes.github.io/tsbir/。

We address the problem of retrieving images with both a sketch and a text query. We present TASK-former (Text And SKetch transformer), an end-to-end trainable model for image retrieval using a text description and a sketch as input. We argue that both input modalities complement each other in a manner that cannot be achieved easily by either one alone. TASK-former follows the late-fusion dual-encoder approach, similar to CLIP, which allows efficient and scalable retrieval since the retrieval set can be indexed independently of the queries. We empirically demonstrate that using an input sketch (even a poorly drawn one) in addition to text considerably increases retrieval recall compared to traditional text-based image retrieval. To evaluate our approach, we collect 5,000 hand-drawn sketches for images in the test set of the COCO dataset. The collected sketches are available a https://janesjanes.github.io/tsbir/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题