论文标题
丹麦参与搜索引擎TREC COVID-19挑战:有关搜索Covid-19的精确生物医学科学信息的经验教训
Denmark's Participation in the Search Engine TREC COVID-19 Challenge: Lessons Learned about Searching for Precise Biomedical Scientific Information on COVID-19
论文作者
论文摘要
该报告描述了由美国国家标准与技术研究院(NIST)及其文本检索会议(TREC)部门组织的两所丹麦大学哥本哈根大学和奥尔堡大学的参与。竞争的目的是找到最佳的搜索引擎策略,用于从最大的时间点,从最大的时间点检索有关Covid-19的精确生物医学科学信息,即关于COVID-19-COVID-19-COVID-19-COVID-19-OPEN REACH研究数据集(Cord-19)。 Cord-19是2020年3月美国白宫对技术社区的呼吁的结果,此后不久,在Kaggle上宣布了Allen Ai Institute for Allen Institute for Allen Institute,Chan Zuckerberg倡议,乔治敦大学安全和新兴技术中心,Microsoft技术,Microsoft和National Inster Insterutiutiutiutiutiutitues of Nedry Insituits Insitus。 Cord-19包含超过200,000篇有关Covid-19,Sars-Cov-2和相关冠状病毒的学术文章(其中有100,000多个具有全文),这些文章是从精选的生物医学来源收集的。 TREC-COVID挑战要求(a)以(a)对生物医学专家提出的一些查询来检索准确和精确的科学信息的最佳方法,并且(b)通过与查询的相关性对此信息进行排名。 在本文档中,我们描述了TREC-COVID竞争的设置,我们参与其中以及我们所产生的思考和经验教训,当时面对急性化的文学科学信息,从而响应高度专业化的疑问,即在中等程度上。
This report describes the participation of two Danish universities, University of Copenhagen and Aalborg University, in the international search engine competition on COVID-19 (the 2020 TREC-COVID Challenge) organised by the U.S. National Institute of Standards and Technology (NIST) and its Text Retrieval Conference (TREC) division. The aim of the competition was to find the best search engine strategy for retrieving precise biomedical scientific information on COVID-19 from the largest, at that point in time, dataset of curated scientific literature on COVID-19 -- the COVID-19 Open Research Dataset (CORD-19). CORD-19 was the result of a call to action to the tech community by the U.S. White House in March 2020, and was shortly thereafter posted on Kaggle as an AI competition by the Allen Institute for AI, the Chan Zuckerberg Initiative, Georgetown University's Center for Security and Emerging Technology, Microsoft, and the National Library of Medicine at the US National Institutes of Health. CORD-19 contained over 200,000 scholarly articles (of which more than 100,000 were with full text) about COVID-19, SARS-CoV-2, and related coronaviruses, gathered from curated biomedical sources. The TREC-COVID challenge asked for the best way to (a) retrieve accurate and precise scientific information, in response to some queries formulated by biomedical experts, and (b) rank this information decreasingly by its relevance to the query. In this document, we describe the TREC-COVID competition setup, our participation to it, and our resulting reflections and lessons learned about the state-of-art technology when faced with the acute task of retrieving precise scientific information from a rapidly growing corpus of literature, in response to highly specialised queries, in the middle of a pandemic.