论文标题

使用基于实体的暹罗网络具有半监督对比学习和知识库的多语言新闻位置检测

Multilingual News Location Detection using an Entity-Based Siamese Network with Semi-Supervised Contrastive Learning and Knowledge Base

论文作者

Suárez-Paniagua, Víctor, Derby, Steven, Wijaya, Tri Kurniawan

论文摘要

在一段新闻中对相关位置的早期发现在极端事件中尤其重要,例如环境灾难,战争冲突,疾病爆发或政治动荡。此外,此检测还有助于推荐系统根据用户位置促进相关新闻。请注意,当未在文本中明确提及相关位置时,最新的方法通常无法识别它们,因为这些方法依赖于句法识别。相比之下,通过将知识库和将实体与其位置联系起来,即使在文本中没有明确提及相关位置,我们的系统即使没有明确提及相关位置。为了评估我们的方法的有效性,并且由于该领域缺乏数据集,我们还使用金标准的多语言新闻 - 地址数据集NewsLoc为研究社区做出了贡献。它包含了600多种语言的600多种Wikinews文章的相关位置(及其Wikidata ID)的注释:英语,法语,德语,意大利语和西班牙语。通过实验评估,我们表明我们所提出的系统使用半监督数据的数据优于基准和模型的微调版本,从而提高了分类率。源代码和NewsLoc数据集可公开可供研究社区使用https://github.com/vsuarezpaniagua/newslocation。

Early detection of relevant locations in a piece of news is especially important in extreme events such as environmental disasters, war conflicts, disease outbreaks, or political turmoils. Additionally, this detection also helps recommender systems to promote relevant news based on user locations. Note that, when the relevant locations are not mentioned explicitly in the text, state-of-the-art methods typically fail to recognize them because these methods rely on syntactic recognition. In contrast, by incorporating a knowledge base and connecting entities with their locations, our system successfully infers the relevant locations even when they are not mentioned explicitly in the text. To evaluate the effectiveness of our approach, and due to the lack of datasets in this area, we also contribute to the research community with a gold-standard multilingual news-location dataset, NewsLOC. It contains the annotation of the relevant locations (and their WikiData IDs) of 600+ Wikinews articles in five different languages: English, French, German, Italian, and Spanish. Through experimental evaluations, we show that our proposed system outperforms the baselines and the fine-tuned version of the model using semi-supervised data that increases the classification rate. The source code and the NewsLOC dataset are publicly available for being used by the research community at https://github.com/vsuarezpaniagua/NewsLocation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源