Seesaw：图像数据库的交互式临时搜索

论文标题

Seesaw：图像数据库的交互式临时搜索

SeeSaw: Interactive Ad-hoc Search Over Image Databases

论文作者

Moll, Oscar, Favela, Manuel, Madden, Samuel, Gadepally, Vijay, Cafarella, Michael

论文摘要

随着图像数据集变得无处不在，对图像数据进行临时搜索的问题越来越重要。机器学习中的许多高级数据任务，例如构建用于培训和测试对象探测器的数据集，意味着在大图像数据集中找到临时对象或场景作为关键的子问题。在大规模的Web数据集上训练的新基础视觉语义嵌入式（例如对比语言图像预训练（剪辑））可以帮助用户开始搜索自己的数据，但是我们发现，这些模型在实践中缺乏的查询很长。 Seesaw是一个在图像数据集上进行交互式临时搜索的系统，该系统以盒子注释的形式集成了最新的嵌入式嵌入，例如夹子与用户反馈，以帮助用户快速在更硬的查询的长尾中找到其数据感兴趣的图像。 SeeSAW的一个主要挑战是，在实践中，将反馈纳入未来的结果中的许多明智的方法，包括最新的主动学习算法，与不引入无反馈相比，结果可能会使结果恶化，部分原因是Clip的高平均水平性能。因此，Seesaw包括几种算法，这些算法在经验上会导致更大且更加一致的改进。我们将SeeSAW的准确性与单独使用夹和最先进的主动学习基线进行比较，并始终找到Seesaw，有助于改善四个数据集和超过一千个查询的用户的结果。 SEESAW在较宽的基准（从0.72的基础上）提高了搜索任务的平均精度（AP）的平均值为0.08，并且在更困难的查询子集中，单独的夹子单独执行较差的子集。

As image datasets become ubiquitous, the problem of ad-hoc searches over image data is increasingly important. Many high-level data tasks in machine learning, such as constructing datasets for training and testing object detectors, imply finding ad-hoc objects or scenes within large image datasets as a key sub-problem. New foundational visual-semantic embeddings trained on massive web datasets such as Contrastive Language-Image Pre-Training (CLIP) can help users start searches on their own data, but we find there is a long tail of queries where these models fall short in practice. SeeSaw is a system for interactive ad-hoc searches on image datasets that integrates state-of-the-art embeddings like CLIP with user feedback in the form of box annotations to help users quickly locate images of interest in their data even in the long tail of harder queries. One key challenge for SeeSaw is that, in practice, many sensible approaches to incorporating feedback into future results, including state-of-the-art active-learning algorithms, can worsen results compared to introducing no feedback, partly due to CLIP's high-average performance. Therefore, SeeSaw includes several algorithms that empirically result in larger and also more consistent improvements. We compare SeeSaw's accuracy to both using CLIP alone and to a state-of-the-art active-learning baseline and find SeeSaw consistently helps improve results for users across four datasets and more than a thousand queries. SeeSaw increases Average Precision (AP) on search tasks by an average of .08 on a wide benchmark (from a base of .72), and by a .27 on a subset of more difficult queries where CLIP alone performs poorly.

下载PDF全文

下载文献需遵守相关版权规定

论文标题