论文标题
预见 - 使用EHRS建模患者时间表的生成预估计的变压器(GPT)
Foresight -- Generative Pretrained Transformer (GPT) for Modelling of Patient Timelines using EHRs
论文作者
论文摘要
背景:电子健康记录拥有有关每个患者的健康状况和一般临床病史的详细纵向信息,其中很大一部分存储在非结构化文本中。现有方法主要集中在结构化数据和单域结果的子集上。我们探讨了如何使用深层生成变压器从自由文本和结构化数据中对患者进行时间建模,以预测广泛的未来疾病,物质,程序或发现。方法:我们提出了远见,这是一种基于变压器的新型管道,它使用命名实体识别和链接工具将文档文本转换为结构化的编码概念,然后为未来的医疗事件(例如疾病,物质,过程和发现)提供概率预测。我们从三个不同的医院数据集中处理了整个自由文本的部分,总计811336名患者涵盖了身心健康。调查结果:在两家英国医院(国王学院医院,南伦敦和莫德斯利)和美国Mimic-III数据集精度@10 0.68、0.76和0.88进行了预测,以预测患者时间表中的下一个疾病,而精度为0.80,0.81和0.91的精确度@10.80,0.81和0.91是为了预测下一个概念的概念。五位临床医生还对34个合成患者时间表进行了验证,并获得了97%的相关性。作为一种生成模型,它可以根据需要的多个步骤来预测生物医学概念。解释:远见是用于生物医学概念建模的通用模型,可用于实际风险预测,虚拟试验和临床研究,以研究疾病的发展,模拟干预措施和反事实和教育目的。
Background: Electronic Health Records hold detailed longitudinal information about each patient's health status and general clinical history, a large portion of which is stored within the unstructured text. Existing approaches focus mostly on structured data and a subset of single-domain outcomes. We explore how temporal modelling of patients from free text and structured data, using deep generative transformers can be used to forecast a wide range of future disorders, substances, procedures or findings. Methods: We present Foresight, a novel transformer-based pipeline that uses named entity recognition and linking tools to convert document text into structured, coded concepts, followed by providing probabilistic forecasts for future medical events such as disorders, substances, procedures and findings. We processed the entire free-text portion from three different hospital datasets totalling 811336 patients covering both physical and mental health. Findings: On tests in two UK hospitals (King's College Hospital, South London and Maudsley) and the US MIMIC-III dataset precision@10 0.68, 0.76 and 0.88 was achieved for forecasting the next disorder in a patient timeline, while precision@10 of 0.80, 0.81 and 0.91 was achieved for forecasting the next biomedical concept. Foresight was also validated on 34 synthetic patient timelines by five clinicians and achieved relevancy of 97% for the top forecasted candidate disorder. As a generative model, it can forecast follow-on biomedical concepts for as many steps as required. Interpretation: Foresight is a general-purpose model for biomedical concept modelling that can be used for real-world risk forecasting, virtual trials and clinical research to study the progression of disorders, simulate interventions and counterfactuals, and educational purposes.