在视觉接地的神经语法获取中学到了什么

论文标题

在视觉接地的神经语法获取中学到了什么

What is Learned in Visually Grounded Neural Syntax Acquisition

论文作者

Kojima, Noriyuki, Averbuch-Elor, Hadar, Rush, Alexander M., Artzi, Yoav

论文摘要

视觉特征是学习引导文本模型的有希望的信号。但是，黑盒学习模型使得很难隔离视觉组件的特定贡献。在此分析中，我们考虑了视觉接地的神经语法学习者的案例研究（Shi等，2019），这是一种从视觉训练信号中学习语法的最新方法。通过构建模型的简化版本，我们隔离了产生模型强大性能的核心因素。与模型可能能够学习的能力相反，我们发现表达式版本明显较小，产生了相似的预测，并且表现得同样甚至更好。我们还发现，名词具体性的简单词汇信号在模型的预测中起主要作用，而不是更复杂的句法推理。

Visual features are a promising signal for learning bootstrap textual models. However, blackbox learning models make it difficult to isolate the specific contribution of visual components. In this analysis, we consider the case study of the Visually Grounded Neural Syntax Learner (Shi et al., 2019), a recent approach for learning syntax from a visual training signal. By constructing simplified versions of the model, we isolate the core factors that yield the model's strong performance. Contrary to what the model might be capable of learning, we find significantly less expressive versions produce similar predictions and perform just as well, or even better. We also find that a simple lexical signal of noun concreteness plays the main role in the model's predictions as opposed to more complex syntactic reasoning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题