论文标题

同义知识增强了中文成语阅读理解的读者

Synonym Knowledge Enhanced Reader for Chinese Idiom Reading Comprehension

论文作者

Long, Siyu, Wang, Ran, Tao, Kun, Zeng, Jiali, Dai, Xin-Yu

论文摘要

机器阅读理解(MRC)是要求机器根据给定上下文回答问题的任务。对于中国MRC,由于非文字和非组成语义特征,中国成语对机器构成了独特的挑战。先前的研究倾向于在不完全利用它们之间的关系的情况下分别对待成语。在本文中,我们首先定义了字面意义覆盖范围的概念,以衡量中国成语的语义和字面意义之间的一致性。通过定义,我们证明了许多习语的字面意义远非其语义,我们还验证了同义词关系可以减轻这种不一致,这对成语的理解是有益的。此外,为了充分利用同义词关系,我们提出了同义词知识增强的读者。具体而言,对于每个成语,我们首先根据高质量同义词词典的注释构建同义图,或者是预训练的成语嵌入之间的余弦相似性,然后结合图形注意力网络和栅极机制以编码图形。关于CHID的实验结果,CHID是一种大规模的中国成语阅读理解数据集,表明我们的模型实现了最新的性能。

Machine reading comprehension (MRC) is the task that asks a machine to answer questions based on a given context. For Chinese MRC, due to the non-literal and non-compositional semantic characteristics, Chinese idioms pose unique challenges for machines to understand. Previous studies tend to treat idioms separately without fully exploiting the relationship among them. In this paper, we first define the concept of literal meaning coverage to measure the consistency between semantics and literal meanings for Chinese idioms. With the definition, we prove that the literal meanings of many idioms are far from their semantics, and we also verify that the synonymic relationship can mitigate this inconsistency, which would be beneficial for idiom comprehension. Furthermore, to fully utilize the synonymic relationship, we propose the synonym knowledge enhanced reader. Specifically, for each idiom, we first construct a synonym graph according to the annotations from a high-quality synonym dictionary or the cosine similarity between the pre-trained idiom embeddings and then incorporate the graph attention network and gate mechanism to encode the graph. Experimental results on ChID, a large-scale Chinese idiom reading comprehension dataset, show that our model achieves state-of-the-art performance.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源