变压器模型的比较评估，用于取消临床文本数据的识别

论文标题

变压器模型的比较评估，用于取消临床文本数据的识别

A Comparative Evaluation Of Transformer Models For De-Identification Of Clinical Text Data

论文作者

Meaney, Christopher, Hakimpour, Wali, Kalia, Sumeet, Moineddin, Rahim

论文摘要

目的：在I2B2/Uthealth 2014临床文本DE-IDENICEFICAL挑战语料库中识别受保护的健康信息（PHI）方面相对评估几种变压器模型架构。方法：I2B2/Uthealth 2014 copus包含n = 1304个临床注释，从n = 296例患者获得。使用转移学习框架，我们在语料库上微调了几种变压器模型架构，包括：Bert-Base，Bert-Large，Roberta-Base，Roberta-Large，Albert-Base和Albert-Xxlarge。在微调过程中，我们会改变以下模型超参数：批量尺寸，数字训练时期，学习率和体重衰减。我们在培训数据集中微调模型，在独立验证数据集上评估并选择最佳执行模型，最后评估持有测试数据集中的概括性能。我们从准确性，精度（积极预测值），回忆（灵敏度）和F1分数（精度和召回的谐波平均值）方面评估模型性能。我们对整体模型性能感兴趣（PHI确定与未识别的PHI）以及PHI特定的模型性能感兴趣。结果：我们观察到，罗伯塔大型模型在识别I2B2/Uthealth 2014语料库中的PHI方面表现最佳，在持有测试语料库中的总体准确性> 99％> 99％的总体准确性和96.7％的召回/精度。在许多PHI课程中的表现都很好。但是，准确/精确/召回量减少以识别以下实体类别：专业，组织，年龄和某些位置。结论：变形金刚是临床文本去识别的有前途的模型类/体系结构。凭借最少的高参数调谐变压器，研究人员/临床医生有机会获得（近乎）最先进的表现。

Objective: To comparatively evaluate several transformer model architectures at identifying protected health information (PHI) in the i2b2/UTHealth 2014 clinical text de-identification challenge corpus. Methods: The i2b2/UTHealth 2014 corpus contains N=1304 clinical notes obtained from N=296 patients. Using a transfer learning framework, we fine-tune several transformer model architectures on the corpus, including: BERT-base, BERT-large, ROBERTA-base, ROBERTA-large, ALBERT-base and ALBERT-xxlarge. During fine-tuning we vary the following model hyper-parameters: batch size, number training epochs, learning rate and weight decay. We fine tune models on a training data set, we evaluate and select optimally performing models on an independent validation dataset, and lastly assess generalization performance on a held-out test dataset. We assess model performance in terms of accuracy, precision (positive predictive value), recall (sensitivity) and F1 score (harmonic mean of precision and recall). We are interested in overall model performance (PHI identified vs. PHI not identified), as well as PHI-specific model performance. Results: We observe that the ROBERTA-large models perform best at identifying PHI in the i2b2/UTHealth 2014 corpus, achieving >99% overall accuracy and 96.7% recall/precision on the heldout test corpus. Performance was good across many PHI classes; however, accuracy/precision/recall decreased for identification of the following entity classes: professions, organizations, ages, and certain locations. Conclusions: Transformers are a promising model class/architecture for clinical text de-identification. With minimal hyper-parameter tuning transformers afford researchers/clinicians the opportunity to obtain (near) state-of-the-art performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题