论文标题

土耳其语的自动词汇简化

Automatic Lexical Simplification for Turkish

论文作者

Uluslu, Ahmet Yavuz

论文摘要

在本文中,我们介绍了第一个针对土耳其语的自动词汇简化系统。最近的文本简化工作依赖于手动制作的简化语料库和全面的NLP工具,这些工具可以在单词和句子级别中分析目标文本。土耳其语是一种形态上丰富的凝集性语言,需要独特的考虑因素,例如适当处理拐点病例。就可用资源和工业强度工具而言,它是一种低资源的语言,它使文本简化任务更加难以接近。我们提出了一种基于预处理的表示模型BERT以及形态特征的新文本简化管道,以生成语法正确和语义上适当的单词级别的简化。

In this paper, we present the first automatic lexical simplification system for the Turkish language. Recent text simplification efforts rely on manually crafted simplified corpora and comprehensive NLP tools that can analyse the target text both in word and sentence levels. Turkish is a morphologically rich agglutinative language that requires unique considerations such as the proper handling of inflectional cases. Being a low-resource language in terms of available resources and industrial-strength tools, it makes the text simplification task harder to approach. We present a new text simplification pipeline based on pretrained representation model BERT together with morphological features to generate grammatically correct and semantically appropriate word-level simplifications.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源