论文标题
乌兹别克式架架有限状态机器用于茎
Uzbek affix finite state machine for stemming
论文作者
论文摘要
这项工作为使用有限状态机器提供了乌兹别克语言的形态分析仪。提出的方法是通过使用词缀条纹找到根,而不包括任何词典,对乌兹别克单词进行了形态学分析。这种方法有助于对大量文本进行高速文本的单词进行形态学分析,并且不需要使用记忆来保持词汇。根据乌兹别克(Uzbek)的说法,可以使用有限的状态机(FSM)设计凝集性语言。与以前的作品相反,本研究通过使用乌兹别克语语言的形态学规则以右顺序对所有单词类的FSM进行了建模。本文显示了这种方法的阶段,包括词缀的分类,每个词缀类的FSM的产生以及将组合制成头机的组合,以使分析成为单词。
This work presents a morphological analyzer for the Uzbek language using a finite state machine. The proposed methodology is a morphologic analysis of Uzbek words by using an affix striping to find a root and without including any lexicon. This method helps to perform morphological analysis of words from a large amount of text at high speed as well as it is not required using of memory for keeping vocabulary. According to Uzbek, an agglutinative language can be designed with finite state machines (FSMs). In contrast to the previous works, this study modeled the completed FSMs for all word classes by using the Uzbek language's morphotactic rules in right to left order. This paper shows the stages of this methodology including the classification of the affixes, the generation of the FSMs for each affix class, and the combination into a head machine to make analysis a word.