论文标题
神经符号视觉对话框
Neuro-Symbolic Visual Dialog
论文作者
论文摘要
我们提出了神经符号视觉对话框(NSVD),这是将深度学习和符号程序执行的第一种方法进行多轮视觉上的推理。 NSVD对视觉对话固有的两个关键挑战:长距离共同参考解决方案以及消失的提问绩效的表现极大地胜过现有的纯连接主义方法。我们通过提出一个更现实,更严格的评估方案来演示后者,在计算准确性时,我们为完整的对话记录使用预测的答案。我们描述了我们的模型的两个变体,并表明,使用这种新方案,我们的最佳模型在CLEVR -DIALOG上获得了99.72%的准确性 - 相对改善超过10%以上,而仅需要培训数据的一部分。此外,我们证明了我们的神经符号模型具有更高的平均第一场失败回合,对不完整的对话历史具有更强的强大,并且不仅概括了对话的长达三倍的对话,而且还比训练期间看到的对话更长,而且还可以看不见的问题类型和场景。
We propose Neuro-Symbolic Visual Dialog (NSVD) -the first method to combine deep learning and symbolic program execution for multi-round visually-grounded reasoning. NSVD significantly outperforms existing purely-connectionist methods on two key challenges inherent to visual dialog: long-distance co-reference resolution as well as vanishing question-answering performance. We demonstrate the latter by proposing a more realistic and stricter evaluation scheme in which we use predicted answers for the full dialog history when calculating accuracy. We describe two variants of our model and show that using this new scheme, our best model achieves an accuracy of 99.72% on CLEVR-Dialog -a relative improvement of more than 10% over the state of the art while only requiring a fraction of training data. Moreover, we demonstrate that our neuro-symbolic models have a higher mean first failure round, are more robust against incomplete dialog histories, and generalise better not only to dialogs that are up to three times longer than those seen during training but also to unseen question types and scenes.