基于相关性的边距，用于对比训练的视频检索模型

论文标题

基于相关性的边距，用于对比训练的视频检索模型

Relevance-based Margin for Contrastively-trained Video Retrieval Models

论文作者

Falcon, Alex, Sudhakaran, Swathikiran, Serra, Giuseppe, Escalera, Sergio, Lanz, Oswald

论文摘要

使用自然语言查询的视频检索由于其在现实世界应用中的相关性，从私人媒体画廊的智能访问到网络规模的视频搜索，因此引起了人们的兴趣。在联合嵌入空间中学习视频和文本的交叉相似性是主要方法。为此，通常会施加对比损失，因为它通过将类似的物品放置在近距离和不同的项目中来组织嵌入空间。该框架会导致竞争性召回率，因为它们仅专注于地面截然项目的排名。但是，在考虑智能检索系统时，评估排名列表的质量至关重要，因为多个项目可能具有相似的语义，因此具有很高的相关性。此外，上述框架使用固定的边距将相似和不同的项目分开，将所有非地面物品视为同样无关的项目。在本文中，我们建议使用一个可变的余量：我们认为，根据项目与给定查询的相关程度，即基于相关性的边际，在培训过程中使用的保证金有所不同，很容易提高通过NDCG和MAP测量的排名列表的质量。我们使用Epic-Kitchens-100和YouCook2上的不同模型来证明我们技术的优势。我们表明，即使我们仔细调整了固定边距，我们的技术（没有作为超级参数的边缘）仍然可以实现更好的性能。最后，广泛的消融研究和定性分析支持我们方法的鲁棒性。代码将在\ url {https://github.com/aranciokov/relevancemargin-icmr22}发布。

Video retrieval using natural language queries has attracted increasing interest due to its relevance in real-world applications, from intelligent access in private media galleries to web-scale video search. Learning the cross-similarity of video and text in a joint embedding space is the dominant approach. To do so, a contrastive loss is usually employed because it organizes the embedding space by putting similar items close and dissimilar items far. This framework leads to competitive recall rates, as they solely focus on the rank of the groundtruth items. Yet, assessing the quality of the ranking list is of utmost importance when considering intelligent retrieval systems, since multiple items may share similar semantics, hence a high relevance. Moreover, the aforementioned framework uses a fixed margin to separate similar and dissimilar items, treating all non-groundtruth items as equally irrelevant. In this paper we propose to use a variable margin: we argue that varying the margin used during training based on how much relevant an item is to a given query, i.e. a relevance-based margin, easily improves the quality of the ranking lists measured through nDCG and mAP. We demonstrate the advantages of our technique using different models on EPIC-Kitchens-100 and YouCook2. We show that even if we carefully tuned the fixed margin, our technique (which does not have the margin as a hyper-parameter) would still achieve better performance. Finally, extensive ablation studies and qualitative analysis support the robustness of our approach. Code will be released at \url{https://github.com/aranciokov/RelevanceMargin-ICMR22}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题