论文标题
嵌入基于组的参考表达理解的差异化相关性
Differentiated Relevances Embedding for Group-based Referring Expression Comprehension
论文作者
论文摘要
引用表达理解的关键在于捕获跨模式的视觉语言相关性。现有作品通常在每个图像中对跨模式相关性建模,在每个图像中,锚对象/表达式及其阳性表达式/对象的属性与负面表达式/对象具有相同的属性,但具有不同的属性值。这些对象/表达式专门用于学习属性的隐式表示,通过一对不同的值,这会阻碍属性表示,表达式/对象表示及其交叉模式相关性的准确性,因为每个锚点对象/表达式通常具有多个属性,而每个属性通常都有多个势值。为此,我们研究了一个名为“基于组的REC”的新型REC问题,在该问题中,每个对象/表达式都可以同时在语义上相似的图像中构造多个三胞胎。为了解决负面因素的爆炸和差异化的相关性分数的差异,我们提出了多组自进度的相关性学习模式,以根据其交叉模式相关性适应具有不同优先级的组内对象表达对。由于平均跨模式相关性各不相同,因此我们进一步设计了跨组相关性的约束,以平衡组优先级的偏见。三个标准REC基准的实验证明了我们方法的有效性和优势。
The key of referring expression comprehension lies in capturing the cross-modal visual-linguistic relevance. Existing works typically model the cross-modal relevance in each image, where the anchor object/expression and their positive expression/object have the same attribute as the negative expression/object, but with different attribute values. These objects/expressions are exclusively utilized to learn the implicit representation of the attribute by a pair of different values, which however impedes the accuracies of the attribute representations, expression/object representations, and their cross-modal relevances since each anchor object/expression usually has multiple attributes while each attribute usually has multiple potential values. To this end, we investigate a novel REC problem named Group-based REC, where each object/expression is simultaneously employed to construct the multiple triplets among the semantically similar images. To tackle the explosion of the negatives and the differentiation of the anchor-negative relevance scores, we propose the multi-group self-paced relevance learning schema to adaptively assign within-group object-expression pairs with different priorities based on their cross-modal relevances. Since the average cross-modal relevance varies a lot across different groups, we further design an across-group relevance constraint to balance the bias of the group priority. Experiments on three standard REC benchmarks demonstrate the effectiveness and superiority of our method.