论文标题

从社交媒体中提取COVID-19的知识基础

Extracting a Knowledge Base of COVID-19 Events from Social Media

论文作者

Zong, Shi, Baheti, Ashutosh, Xu, Wei, Ritter, Alan

论文摘要

在本文中,我们提出了一条手动注释的10,000条推文的语料库,其中包含五个COVID事件的公开报告,包括正面和负面测试,死亡,拒绝接受测试,要求的治疗方法和预防措施。我们为每种事件类型设计了插槽填充问题,并注释了总共31个细粒插槽,例如事件的位置,最近的旅行和密切联系。我们表明,我们的语料库可以支持基于BERT的分类器,以自动提取公开报告的事件并帮助跟踪新疾病的传播。我们还证明,通过汇总数百万推文提取的事件,我们在回答复杂的查询时达到了令人惊讶的高精度,例如“哪些组织的员工在费城测试了阳性?”我们将发布我们的语料库(删除用户信息),自动提取模型以及对研究社区的相应知识库。

In this paper, we present a manually annotated corpus of 10,000 tweets containing public reports of five COVID-19 events, including positive and negative tests, deaths, denied access to testing, claimed cures and preventions. We designed slot-filling questions for each event type and annotated a total of 31 fine-grained slots, such as the location of events, recent travel, and close contacts. We show that our corpus can support fine-tuning BERT-based classifiers to automatically extract publicly reported events and help track the spread of a new disease. We also demonstrate that, by aggregating events extracted from millions of tweets, we achieve surprisingly high precision when answering complex queries, such as "Which organizations have employees that tested positive in Philadelphia?" We will release our corpus (with user-information removed), automatic extraction models, and the corresponding knowledge base to the research community.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源