长临床文本的一种可解释的端到端微调方法

论文标题

长临床文本的一种可解释的端到端微调方法

An Interpretable End-to-end Fine-tuning Approach for Long Clinical Text

论文作者

Huang, Kexin, Garapati, Sankeerth, Rich, Alexander S.

论文摘要

EHR中的非结构化临床文本包含有关应用程序支持，试验匹配和回顾性研究在内的应用程序的关键信息。鉴于这些模型在其他NLP域中的最新性能，最近的工作将基于BERT的模型应用于临床信息提取和文本分类。但是，BERT很难应用于临床笔记，因为它不能很好地扩展到长期的文本序列。在这项工作中，我们提出了一种新颖的微调方法，称为Snipbert。 Snipbert不使用整个注释，而是识别重要的片段，然后以层次结构的方式将其馈入基于BERT的截断模型。从经验上讲，Snipbert不仅在三个任务中具有显着的预测性能增长，而且还提供了改进的解释性，因为该模型可以识别导致其预测的关键文本。

Unstructured clinical text in EHRs contains crucial information for applications including decision support, trial matching, and retrospective research. Recent work has applied BERT-based models to clinical information extraction and text classification, given these models' state-of-the-art performance in other NLP domains. However, BERT is difficult to apply to clinical notes because it doesn't scale well to long sequences of text. In this work, we propose a novel fine-tuning approach called SnipBERT. Instead of using entire notes, SnipBERT identifies crucial snippets and then feeds them into a truncated BERT-based model in a hierarchical manner. Empirically, SnipBERT not only has significant predictive performance gain across three tasks but also provides improved interpretability, as the model can identify key pieces of text that led to its prediction.

下载PDF全文

下载文献需遵守相关版权规定

论文标题