论文标题
桥接知识图生成场景图
Bridging Knowledge Graphs to Generate Scene Graphs
论文作者
论文摘要
场景图是强大的表示形式,将图像解析为其抽象的语义元素,即对象及其相互作用,从而有助于视觉理解和可解释的推理。另一方面,常识性知识图是丰富的存储库,它们编码世界的结构方式以及一般概念的相互作用方式。在本文中,我们提出了这两个构造的统一公式,其中场景图被视为常识性知识图的图像条件实例化。基于这个新的角度,我们将场景图生成重新构图为场景和常识图之间的桥梁的推断,在该场景和常识图之间,场景图中的每个实体或谓词实例必须链接到其相应的实体或Commonsense图中的谓词类。为此,我们提出了一个基于图的新型神经网络,该网络迭代地在两个图之间以及每个图之间传播信息,同时逐渐在每次迭代中逐渐完善其桥梁。我们的图形桥接网络GB-NET依次注入边缘和节点,可以同时利用和完善互连场景和常识图的丰富,异质结构。通过广泛的实验,我们与最新方法相比,展示了GB-NET的卓越准确性,从而导致了新的最新状态。我们公开发布方法的源代码。
Scene graphs are powerful representations that parse images into their abstract semantic elements, i.e., objects and their interactions, which facilitates visual comprehension and explainable reasoning. On the other hand, commonsense knowledge graphs are rich repositories that encode how the world is structured, and how general concepts interact. In this paper, we present a unified formulation of these two constructs, where a scene graph is seen as an image-conditioned instantiation of a commonsense knowledge graph. Based on this new perspective, we re-formulate scene graph generation as the inference of a bridge between the scene and commonsense graphs, where each entity or predicate instance in the scene graph has to be linked to its corresponding entity or predicate class in the commonsense graph. To this end, we propose a novel graph-based neural network that iteratively propagates information between the two graphs, as well as within each of them, while gradually refining their bridge in each iteration. Our Graph Bridging Network, GB-Net, successively infers edges and nodes, allowing to simultaneously exploit and refine the rich, heterogeneous structure of the interconnected scene and commonsense graphs. Through extensive experimentation, we showcase the superior accuracy of GB-Net compared to the most recent methods, resulting in a new state of the art. We publicly release the source code of our method.