论文标题

科普特的详尽实体认可:挑战和解决方案

Exhaustive Entity Recognition for Coptic: Challenges and Solutions

论文作者

Zeldes, Amir, Martin, Lance, Tu, Sichang

论文摘要

实体识别提供了对数字人文科学中古代材料的语义访问:它可以阐明无法详尽阅读的文本中的人们和景点,从而促进链接依据,并可以为文本内容提供一个窗口,即使对于没有翻译的文本也是如此。在这篇论文中,我们介绍了科普特的实体认可,科普特是希腊时代埃及的语言。 Weevaluate NLP处理任务,并在将其应用于低资源,形态上复杂的语言时列出了困难。我们提供了针对与维基百科相关的命名和非命名嵌套辅助识别和半自动实体的解决方案,依靠可靠的依赖类型,基于功能的CRF模型以及手工制作的知识基础资源,使高级级别的数量范围与高级资源语言的数量范围相比,与高级资源的数据相比。

Entity recognition provides semantic access to ancient materials in the Digital Humanities: itexposes people and places of interest in texts that cannot be read exhaustively, facilitates linkingresources and can provide a window into text contents, even for texts with no translations. Inthis paper we present entity recognition for Coptic, the language of Hellenistic era Egypt. Weevaluate NLP approaches to the task and lay out difficulties in applying them to a low-resource,morphologically complex language. We present solutions for named and non-named nested en-tity recognition and semi-automatic entity linking to Wikipedia, relying on robust dependencyparsing, feature-based CRF models, and hand-crafted knowledge base resources, enabling highaccuracy NER with orders of magnitude less data than those used for high resource languages.The results suggest avenues for research on other languages in similar settings.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源