论文标题
使用BERT的联合波斯单词分割校正和零宽的非邮政识别
Joint Persian Word Segmentation Correction and Zero-Width Non-Joiner Recognition Using BERT
论文作者
论文摘要
在波斯写作系统中,单词适当分段;但是,在实践中,这些写作规则通常被忽略,导致单词被脱颖而出,并且在它们之间没有任何白色空间的情况下写了多个单词。本文解决了波斯语中的单词分割和零宽度的非加路者(ZWNJ)识别的问题,我们将其作为序列标记问题共同处理。我们在精心收集的500个句子的语料库中,达到了92.40%的宏观平均得分,难度很高。
Words are properly segmented in the Persian writing system; in practice, however, these writing rules are often neglected, resulting in single words being written disjointedly and multiple words written without any white spaces between them. This paper addresses the problems of word segmentation and zero-width non-joiner (ZWNJ) recognition in Persian, which we approach jointly as a sequence labeling problem. We achieved a macro-averaged F1-score of 92.40% on a carefully collected corpus of 500 sentences with a high level of difficulty.