咨询清单：标准化人类医学票据生成评估

论文标题

咨询清单：标准化人类医学票据生成评估

Consultation Checklists: Standardising the Human Evaluation of Medical Note Generation

论文作者

Savkov, Aleksandar, Moramarco, Francesco, Korfiatis, Alex Papadopoulos, Perera, Mark, Belz, Anya, Reiter, Ehud

论文摘要

由于输出质量的许多方面的固有主观性质，评估自动生成的文本通常很难。自动咨询说明的生成中，这种困难通过医学专家之间的不同意见而复杂化，这两者都应在生成的笔记中包括哪些患者陈述以及它们对到达诊断的重要性。先前对纸币生成系统的现实评估证明了专家评估者之间的实质性分歧。在本文中，我们提出了一项协议，旨在通过在咨询清单中进行评估来提高客观性，该咨询清单是在初步步骤中创建的，然后用作质量评估期间的共同参考点。我们在使用该方案的首次评估研究中观察到了良好的通道间一致性。此外，与使用原始人类票据相比，使用研究中生成的咨询清单作为自动指标的参考，可以改善其与人类判断的相关性。

Evaluating automatically generated text is generally hard due to the inherently subjective nature of many aspects of the output quality. This difficulty is compounded in automatic consultation note generation by differing opinions between medical experts both about which patient statements should be included in generated notes and about their respective importance in arriving at a diagnosis. Previous real-world evaluations of note-generation systems saw substantial disagreement between expert evaluators. In this paper we propose a protocol that aims to increase objectivity by grounding evaluations in Consultation Checklists, which are created in a preliminary step and then used as a common point of reference during quality assessment. We observed good levels of inter-annotator agreement in a first evaluation study using the protocol; further, using Consultation Checklists produced in the study as reference for automatic metrics such as ROUGE or BERTScore improves their correlation with human judgements compared to using the original human note.

下载PDF全文

下载文献需遵守相关版权规定

论文标题