关系正规场景图生成

论文标题

关系正规场景图生成

Relation Regularized Scene Graph Generation

论文作者

Guo, Yuyu, Gao, Lianli, Song, Jingkuan, Wang, Peng, Sebe, Nicu, Shen, Heng Tao, Li, Xuelong

论文摘要

场景图生成（SGG）构建在检测到的对象的顶部，以预测对象成对的视觉关系，以描述图像内容抽象。现有的作品表明，如果将对象之间的链接作为先验知识，则SGG的性能会大大提高。受此观察的启发，在本文中，我们提出了一个关系正则化网络（R2-net），该网络可以预测两个对象之间是否存在关系，并将此关系编码为对象特征的细化和更好的SGG。具体而言，我们首先在检测到的对象之间构造亲和力矩阵，以表示两个对象之间关系的概率。然后，此关系亲和力矩阵上的图形卷积网络（GCN）用作对象编码器，从而产生对象的关系调查表示。借助这些关系调查的功能，我们的R2-NET可以有效地完善对象标签并生成场景图。在视觉基因组数据集上进行了大量实验，以实现三个SGG任务（即谓词分类，场景图分类和场景图检测），以证明我们提出的方法的有效性。消融研究还验证了我们提出的组件在绩效改善中的关键作用。

Scene graph generation (SGG) is built on top of detected objects to predict object pairwise visual relations for describing the image content abstraction. Existing works have revealed that if the links between objects are given as prior knowledge, the performance of SGG is significantly improved. Inspired by this observation, in this article, we propose a relation regularized network (R2-Net), which can predict whether there is a relationship between two objects and encode this relation into object feature refinement and better SGG. Specifically, we first construct an affinity matrix among detected objects to represent the probability of a relationship between two objects. Graph convolution networks (GCNs) over this relation affinity matrix are then used as object encoders, producing relation-regularized representations of objects. With these relation-regularized features, our R2-Net can effectively refine object labels and generate scene graphs. Extensive experiments are conducted on the visual genome dataset for three SGG tasks (i.e., predicate classification, scene graph classification, and scene graph detection), demonstrating the effectiveness of our proposed method. Ablation studies also verify the key roles of our proposed components in performance improvement.

下载PDF全文

下载文献需遵守相关版权规定

论文标题