基于解释的弱监督学习与图网络的视觉关系学习

论文标题

基于解释的弱监督学习与图网络的视觉关系学习

Explanation-based Weakly-supervised Learning of Visual Relations with Graph Networks

论文作者

Baldassarre, Federico, Smith, Kevin, Sullivan, Josephine, Azizpour, Hossein

论文摘要

视觉关系检测是整体图像理解的基础。但是，由于可能的关系的组合爆炸，自然图像中的长尾分布以及昂贵的注释过程，（主题，谓词，对象）三胞胎的本地化和分类仍然具有挑战性。本文引入了一种新型的弱监督方法，用于视觉关系检测，该方法依赖于最小的图像级谓词标签。对图神经网络进行了训练，可以从检测到的对象的图表中对图像进行分类，从而暗中编码成对关系的归纳偏置。然后，我们将关系检测作为对这种谓词分类器的解释，即我们通过恢复预测谓词的主题和对象获得完整的关系。我们提出的结果与三种不同和挑战性数据集的最新全面和弱监督的方法相媲美：用于人类对象相互作用的HICO-DET，通用对象对象关系的视觉关系检测以及不寻常的三胞胎的影响；表现出对非全面注释的鲁棒性和良好的概括。

Visual relationship detection is fundamental for holistic image understanding. However, the localization and classification of (subject, predicate, object) triplets remain challenging tasks, due to the combinatorial explosion of possible relationships, their long-tailed distribution in natural images, and an expensive annotation process. This paper introduces a novel weakly-supervised method for visual relationship detection that relies on minimal image-level predicate labels. A graph neural network is trained to classify predicates in images from a graph representation of detected objects, implicitly encoding an inductive bias for pairwise relations. We then frame relationship detection as the explanation of such a predicate classifier, i.e. we obtain a complete relation by recovering the subject and object of a predicted predicate. We present results comparable to recent fully- and weakly-supervised methods on three diverse and challenging datasets: HICO-DET for human-object interaction, Visual Relationship Detection for generic object-to-object relations, and UnRel for unusual triplets; demonstrating robustness to non-comprehensive annotations and good few-shot generalization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题