通过深层模板匹配对象级的目标选择

论文标题

通过深层模板匹配对象级的目标选择

Object-Level Targeted Selection via Deep Template Matching

论文作者

Kothawade, Suraj, Roy, Donna, Fenzi, Michele, Haussmann, Elmar, Alvarez, Jose M., Angerer, Christoph

论文摘要

在查询图像中，用与感兴趣的对象（OOI）在语义上相似的对象检索图像具有许多实际用例。一些示例包括修复失败，例如虚假的负面因素/阳性模型的阳性或减轻数据集中的类不平衡。有针对性的选择任务需要从大规模的未标记数据池中找到相关数据。在此规模上采矿是不可行的。此外，OOI通常很小，占据图像区域的1％不到1％，在混乱的场景中与许多语义上不同的物体共存。现有的语义图像检索方法通常集中在较大尺寸的地理地标的采矿和/或需要额外的标记数据，例如带有相似对象的图像/图像对，用于带有通用对象的挖掘图像。我们在DNN特征空间中提出了一个匹配算法的快速稳固的模板，该模板从一个大型未标记的数据池中检索对象级的语义相似图像。我们将查询图像中OOI周围的区域投射到DNN功能空间，以用作模板。这使我们的方法可以专注于OOI的语义，而无需额外的标记数据。在自动驾驶的背景下，我们通过将对象检测器的故障案例作为OOI评估我们的系统进行定向选择。我们在一个带有2.20万张图像的大型未标记数据集上证明了其功效，并在采矿中显示出对具有小型OOI的图像的高回忆。我们将我们的方法与众所周知的语义图像检索方法进行比较，该方法也不需要额外的标记数据。最后，我们证明我们的方法是灵活的，并以一种或多种语义上不同的同时发生的OOI无缝地检索图像。

Retrieving images with objects that are semantically similar to objects of interest (OOI) in a query image has many practical use cases. A few examples include fixing failures like false negatives/positives of a learned model or mitigating class imbalance in a dataset. The targeted selection task requires finding the relevant data from a large-scale pool of unlabeled data. Manual mining at this scale is infeasible. Further, the OOI are often small and occupy less than 1% of image area, are occluded, and co-exist with many semantically different objects in cluttered scenes. Existing semantic image retrieval methods often focus on mining for larger sized geographical landmarks, and/or require extra labeled data, such as images/image-pairs with similar objects, for mining images with generic objects. We propose a fast and robust template matching algorithm in the DNN feature space, that retrieves semantically similar images at the object-level from a large unlabeled pool of data. We project the region(s) around the OOI in the query image to the DNN feature space for use as the template. This enables our method to focus on the semantics of the OOI without requiring extra labeled data. In the context of autonomous driving, we evaluate our system for targeted selection by using failure cases of object detectors as OOI. We demonstrate its efficacy on a large unlabeled dataset with 2.2M images and show high recall in mining for images with small-sized OOI. We compare our method against a well-known semantic image retrieval method, which also does not require extra labeled data. Lastly, we show that our method is flexible and retrieves images with one or more semantically different co-occurring OOI seamlessly.

下载PDF全文

下载文献需遵守相关版权规定

论文标题