论文标题

在现实的多语言数据集上评估神经参考表格选择器

Assessing Neural Referential Form Selectors on a Realistic Multilingual Dataset

论文作者

Chen, Guanyi, Same, Fahime, van Deemter, Kees

论文摘要

关于神经参考表达生成(REG)的先前工作都使用了WebNLG,这是一种英语数据集,已显示出反映非常有限的参考表达式(RE)的使用范围。为了解决此问题,我们基于Ontonotes语料库构建了一个数据集,该数据集包含英语和中文(使用零代词的语言)中更广泛使用的范围。我们相应地构建神经参考形式选择(RFS)模型,在数据集上进行评估并进行探测实验。实验表明,与WebNLG相比,Ontonotes更适合评估REG/RFS模型。我们比较英语和中文RF,并确认,根据语言理论,中文RF比英语更依赖于话语背景。

Previous work on Neural Referring Expression Generation (REG) all uses WebNLG, an English dataset that has been shown to reflect a very limited range of referring expression (RE) use. To tackle this issue, we build a dataset based on the OntoNotes corpus that contains a broader range of RE use in both English and Chinese (a language that uses zero pronouns). We build neural Referential Form Selection (RFS) models accordingly, assess them on the dataset and conduct probing experiments. The experiments suggest that, compared to WebNLG, OntoNotes is better for assessing REG/RFS models. We compare English and Chinese RFS and confirm that, in line with linguistic theories, Chinese RFS depends more on discourse context than English.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源