论文标题

Parsel 1.0:波斯社交媒体文本中链接的无监督实体

ParsEL 1.0: Unsupervised Entity Linking in Persian Social Media Texts

论文作者

Asgari-Bidhendi, Majid, Fakhrian, Farzane, Minaei-Bidgoli, Behrouz

论文摘要

近年来,社交媒体数据已成倍增加,这可以列举为世界上最大的数据存储库之一。这些社交媒体数据的很大一部分是自然语言文本。但是,由于暴露于具有多义单词或短语的实体的频繁发生,自然语言是高度模棱两可的。实体链接是将文本中的实体提及与知识库中其相应实体的任务。最近,波斯知识图的Farsbase已被引入,其中包含近50万个实体。在本文中,我们提出了一个无监督的波斯实体链接系统,这是第一个专门针对波斯语的实体链接系统,该系统利用了与上下文有关和与上下文无关的特征。为此,我们还发布了第一个链接波斯语语料库的实体,其中包含67,595个单词,这些单词已从Telegram Messenger中一些流行渠道的社交媒体文本中爬走。所提出的方法的输出为波斯语的F-评分为86.94%,与英语中类似的最新方法相媲美。

In recent years, social media data has exponentially increased, which can be enumerated as one of the largest data repositories in the world. A large portion of this social media data is natural language text. However, the natural language is highly ambiguous due to exposure to the frequent occurrences of entities, which have polysemous words or phrases. Entity linking is the task of linking the entity mentions in the text to their corresponding entities in a knowledge base. Recently, FarsBase, a Persian knowledge graph, has been introduced containing almost half a million entities. In this paper, we propose an unsupervised Persian Entity Linking system, the first entity linking system specially focused on the Persian language, which utilizes context-dependent and context-independent features. For this purpose, we also publish the first entity linking corpus of the Persian language containing 67,595 words that have been crawled from social media texts of some popular channels in the Telegram messenger. The output of the proposed method is 86.94% f-score for the Persian language, which is comparable with the similar state-of-the-art methods in the English language.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源