论文标题
部分可观测时空混沌系统的无模型预测
KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue
论文作者
论文摘要
视觉对话是一项具有挑战性的任务,需要从视觉(图像)和文本(对话历史记录)上下文中提取隐式信息。古典方法更加关注当前问题,视觉知识和文本知识的整合,鄙视跨模式信息之间的异质语义差距。同时,串联操作已成为跨模式信息融合的事实上的标准,该信息的信息检索能力有限。在本文中,我们通过使用图来弥合智能粒度中视觉和文本知识之间的跨模式语义关系,并通过自适应信息选择模式来检索所需的知识,从而提出了一种新颖的知识桥图网络(KBGN)模型。此外,可以清楚地从模式内实体和模式间桥来清楚地提出了视觉对话的推理线索。 Visdial V1.0和Visdial-Q数据集的实验结果表明,我们的模型优于现有模型,并具有最先进的结果。
Visual dialogue is a challenging task that needs to extract implicit information from both visual (image) and textual (dialogue history) contexts. Classical approaches pay more attention to the integration of the current question, vision knowledge and text knowledge, despising the heterogeneous semantic gaps between the cross-modal information. In the meantime, the concatenation operation has become de-facto standard to the cross-modal information fusion, which has a limited ability in information retrieval. In this paper, we propose a novel Knowledge-Bridge Graph Network (KBGN) model by using graph to bridge the cross-modal semantic relations between vision and text knowledge in fine granularity, as well as retrieving required knowledge via an adaptive information selection mode. Moreover, the reasoning clues for visual dialogue can be clearly drawn from intra-modal entities and inter-modal bridges. Experimental results on VisDial v1.0 and VisDial-Q datasets demonstrate that our model outperforms existing models with state-of-the-art results.