BERT-XML：使用BERT预处理大型自动化ICD编码

论文标题

BERT-XML：使用BERT预处理大型自动化ICD编码

BERT-XML: Large Scale Automated ICD Coding Using BERT Pretraining

论文作者

Zhang, Zachariah, Liu, Jingshu, Razavian, Narges

论文摘要

最初记录并记录在免费文本医疗笔记中的临床相互作用。 ICD编码是对与患者访问相关的所有诊断，症状和程序进行分类和编码的任务。该过程通常是手动的，非常耗时，对于医院来说昂贵。在本文中，我们提出了一种机器学习模型BERT-XML，用于从EHR注释中进行大规模自动化ICD编码，利用最近开发的无监督预处理，这些预处理在各种NLP任务上都实现了最先进的性能。我们从头开始训练BERT模型，以EHR注释，以更适合EHR任务的词汇来学习，从而超越现成的模型。我们以多标签的注意来调整BERT体系结构以进行ICD编码。尽管其他作品专注于小型公共医疗数据集，但我们使用数百万个EHR注释制作了第一个大型ICD-10分类模型，以预测数千种独特的ICD代码。

Clinical interactions are initially recorded and documented in free text medical notes. ICD coding is the task of classifying and coding all diagnoses, symptoms and procedures associated with a patient's visit. The process is often manual and extremely time-consuming and expensive for hospitals. In this paper, we propose a machine learning model, BERT-XML, for large scale automated ICD coding from EHR notes, utilizing recently developed unsupervised pretraining that have achieved state of the art performance on a variety of NLP tasks. We train a BERT model from scratch on EHR notes, learning with vocabulary better suited for EHR tasks and thus outperform off-the-shelf models. We adapt the BERT architecture for ICD coding with multi-label attention. While other works focus on small public medical datasets, we have produced the first large scale ICD-10 classification model using millions of EHR notes to predict thousands of unique ICD codes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题