临床试验资格标准的信息提取

论文标题

临床试验资格标准的信息提取

Information Extraction of Clinical Trial Eligibility Criteria

论文作者

Tseo, Yitong, Salkola, M. I., Mohamed, Ahmed, Kumar, Anuj, Abnousi, Freddy

论文摘要

临床试验对从患者人口统计到食物过敏的各种标准的谓词受试者的资格。试验将其要求表示为语义复杂，非结构化的自由文本。将计算机可介入语法的试验标准正式化将有助于确定资格确定。在本文中，我们研究了一种信息提取方法（IE）方法，用于从临床审判（DOT）Gov试验到共享知识库的基础标准。我们将问题视为一项新的知识库人口任务，并实施结合机器学习和无上下文语法的解决方案。据我们所知，这项工作是为指定实体识别（NER）应用基于注意力的条件随机场架构的第一个标准提取系统，而Word2VEC嵌入了命名实体链接（NEL）的群集。我们在https://github.com/facebookresearch/clinical-trial-parser上释放系统的资源和核心组件。最后，我们报告了每个模块和最终性能；我们得出的结论是，我们的系统与Criteria2Query具有竞争力，我们认为这是标准提取中最新的。

Clinical trials predicate subject eligibility on a diversity of criteria ranging from patient demographics to food allergies. Trials post their requirements as semantically complex, unstructured free-text. Formalizing trial criteria to a computer-interpretable syntax would facilitate eligibility determination. In this paper, we investigate an information extraction (IE) approach for grounding criteria from trials in ClinicalTrials(dot)gov to a shared knowledge base. We frame the problem as a novel knowledge base population task, and implement a solution combining machine learning and context free grammar. To our knowledge, this work is the first criteria extraction system to apply attention-based conditional random field architecture for named entity recognition (NER), and word2vec embedding clustering for named entity linking (NEL). We release the resources and core components of our system on GitHub at https://github.com/facebookresearch/Clinical-Trial-Parser. Finally, we report our per module and end to end performances; we conclude that our system is competitive with Criteria2Query, which we view as the current state-of-the-art in criteria extraction.

下载PDF全文

下载文献需遵守相关版权规定

论文标题