论文标题

乌兹别克式架架有限状态机器用于茎

Uzbek affix finite state machine for stemming

论文作者

Sharipov, Maksud, Salaev, Ulugbek

论文摘要

这项工作为使用有限状态机器提供了乌兹别克语言的形态分析仪。提出的方法是通过使用词缀条纹找到根,而不包括任何词典,对乌兹别克单词进行了形态学分析。这种方法有助于对大量文本进行高速文本的单词进行形态学分析,并且不需要使用记忆来保持词汇。根据乌兹别克(Uzbek)的说法,可以使用有限的状态机(FSM)设计凝集性语言。与以前的作品相反,本研究通过使用乌兹别克语语言的形态学规则以右顺序对所有单词类的FSM进行了建模。本文显示了这种方法的阶段,包括词缀的分类,每个词缀类的FSM的产生以及将组合制成头机的组合,以使分析成为单词。

This work presents a morphological analyzer for the Uzbek language using a finite state machine. The proposed methodology is a morphologic analysis of Uzbek words by using an affix striping to find a root and without including any lexicon. This method helps to perform morphological analysis of words from a large amount of text at high speed as well as it is not required using of memory for keeping vocabulary. According to Uzbek, an agglutinative language can be designed with finite state machines (FSMs). In contrast to the previous works, this study modeled the completed FSMs for all word classes by using the Uzbek language's morphotactic rules in right to left order. This paper shows the stages of this methodology including the classification of the affixes, the generation of the FSMs for each affix class, and the combination into a head machine to make analysis a word.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源