论文标题

BERT-XML:使用BERT预处理大型自动化ICD编码

BERT-XML: Large Scale Automated ICD Coding Using BERT Pretraining

论文作者

Zhang, Zachariah, Liu, Jingshu, Razavian, Narges

论文摘要

最初记录并记录在免费文本医疗笔记中的临床相互作用。 ICD编码是对与患者访问相关的所有诊断,症状和程序进行分类和编码的任务。该过程通常是手动的,非常耗时,对于医院来说昂贵。在本文中,我们提出了一种机器学习模型BERT-XML,用于从EHR注释中进行大规模自动化ICD编码,利用最近开发的无监督预处理,这些预处理在各种NLP任务上都实现了最先进的性能。我们从头开始训练BERT模型,以EHR注释,以更适合EHR任务的词汇来学习,从而超越现成的模型。我们以多标签的注意来调整BERT体系结构以进行ICD编码。尽管其他作品专注于小型公共医疗数据集,但我们使用数百万个EHR注释制作了第一个大型ICD-10分类模型,以预测数千种独特的ICD代码。

Clinical interactions are initially recorded and documented in free text medical notes. ICD coding is the task of classifying and coding all diagnoses, symptoms and procedures associated with a patient's visit. The process is often manual and extremely time-consuming and expensive for hospitals. In this paper, we propose a machine learning model, BERT-XML, for large scale automated ICD coding from EHR notes, utilizing recently developed unsupervised pretraining that have achieved state of the art performance on a variety of NLP tasks. We train a BERT model from scratch on EHR notes, learning with vocabulary better suited for EHR tasks and thus outperform off-the-shelf models. We adapt the BERT architecture for ICD coding with multi-label attention. While other works focus on small public medical datasets, we have produced the first large scale ICD-10 classification model using millions of EHR notes to predict thousands of unique ICD codes.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源