论文标题

对茎数据的形态歧义歧义

Morphological Disambiguation from Stemming Data

论文作者

Nzeyimana, Antoine

论文摘要

形态学分析和歧义是一项重要的任务,也是对形态丰富语言的自然语言处理的关键预处理步骤。 Kinyarwanda是一种形态上丰富的语言,目前缺乏自动化形态分析的工具。虽然可以轻松地开发语言策划的有限状态工具进行形态学分析,但该语言的形态丰富性允许进行许多模棱两可的分析,需要有效的歧义。在本文中,我们建议学习从形态学上消除kinyarwanda的口头形式,从通过众包收集的新词干数据集中。使用功能工程和基于馈电神经网络的分类器,我们达到了约89%的非上下文化歧义精度。我们的实验表明,茎和词素关联规则的拐点特性是歧义的最歧视性特征。

Morphological analysis and disambiguation is an important task and a crucial preprocessing step in natural language processing of morphologically rich languages. Kinyarwanda, a morphologically rich language, currently lacks tools for automated morphological analysis. While linguistically curated finite state tools can be easily developed for morphological analysis, the morphological richness of the language allows many ambiguous analyses to be produced, requiring effective disambiguation. In this paper, we propose learning to morphologically disambiguate Kinyarwanda verbal forms from a new stemming dataset collected through crowd-sourcing. Using feature engineering and a feed-forward neural network based classifier, we achieve about 89% non-contextualized disambiguation accuracy. Our experiments reveal that inflectional properties of stems and morpheme association rules are the most discriminative features for disambiguation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源