公制学习的课程是否会推广到图像捕获检索？

论文标题

公制学习的课程是否会推广到图像捕获检索？

Do Lessons from Metric Learning Generalize to Image-Caption Retrieval?

论文作者

Bleeker, Maurits, de Rijke, Maarten

论文摘要

半固定负面因素的三胞胎损失已成为从头开始优化的图像捕获检索（ICR）方法的事实上的选择。公制学习的最新进展引起了新的损失功能，超过了图像检索和表示学习等任务的三胞胎损失。我们询问这些发现是否通过在两种ICR方法上比较三个损失函数来推广到ICR的设置。我们对这个问题进行负面的回答：半硬采矿的三胞胎损失仍然优于ICR任务上公制学习的新引入的损失功能。为了更好地了解这些结果，我们引入了一种分析方法来比较损失函数，以计算有多少样本有助于梯度W.R.T.优化过程中的查询表示。我们发现，在计算梯度W.R.T.时，总体上考虑了太多（非信息）样本的损失功能，从而导致ICR任务评估得分较低。查询表示形式，导致次优性能。显示半障碍的三胞胎损失显示出胜过其他损失函数，因为计算梯度时仅考虑一个（硬）负数。

The triplet loss with semi-hard negatives has become the de facto choice for image-caption retrieval (ICR) methods that are optimized from scratch. Recent progress in metric learning has given rise to new loss functions that outperform the triplet loss on tasks such as image retrieval and representation learning. We ask whether these findings generalize to the setting of ICR by comparing three loss functions on two ICR methods. We answer this question negatively: the triplet loss with semi-hard negative mining still outperforms newly introduced loss functions from metric learning on the ICR task. To gain a better understanding of these outcomes, we introduce an analysis method to compare loss functions by counting how many samples contribute to the gradient w.r.t. the query representation during optimization. We find that loss functions that result in lower evaluation scores on the ICR task, in general, take too many (non-informative) samples into account when computing a gradient w.r.t. the query representation, which results in sub-optimal performance. The triplet loss with semi-hard negatives is shown to outperform the other loss functions, as it only takes one (hard) negative into account when computing the gradient.

下载PDF全文

下载文献需遵守相关版权规定

论文标题