指导疟疾诊断的机器学习算法开发的指标

论文标题

指导疟疾诊断的机器学习算法开发的指标

Metrics to guide development of machine learning algorithms for malaria diagnosis

论文作者

Delahunt, Charles B., Gachuhi, Noni, Horning, Matthew P.

论文摘要

自动疟疾诊断是机器学习（ML）的困难但高价值的目标，有效的算法可以挽救数千个儿童的生命。但是，当前的ML努力在很大程度上忽略了关键的用例限制，因此在临床上没有用。特别是两个因素对于开发可翻译为临床现场环境的算法至关重要：（i）对ML溶液必须适应的临床需求的清晰了解；（ii）指导和评估ML模型的与任务相关的指标。这些因素的忽视严重阻碍了过去在疟疾上的ML工作，因为由此产生的算法与临床需求不符。在本文中，我们通过显微镜诊断出GIEMSA染色的血膜来解决这两个问题。首先，我们描述了为什么领域专业知识对于有效地将ML应用于疟疾，并列出提供此领域知识的技术文档和其他资源至关重要。其次，我们详细介绍了针对疟疾诊断的临床要求量身定制的性能指标，以指导ML模型的开发并通过临床需求的镜头（相对于通用ML镜头）评估模型性能。我们强调了患者级别的观点，室内变异性，假阳性率，检测限制和不同类型的错误的重要性。我们还讨论了ML工作中常用的ROC曲线，AUC和F1的原因很不适合这种情况。这些发现还适用于涉及寄生虫负荷的其他疾病，包括被忽视的热带疾病（NTD），例如血吸虫病。

Automated malaria diagnosis is a difficult but high-value target for machine learning (ML), and effective algorithms could save many thousands of children's lives. However, current ML efforts largely neglect crucial use case constraints and are thus not clinically useful. Two factors in particular are crucial to developing algorithms translatable to clinical field settings: (i) Clear understanding of the clinical needs that ML solutions must accommodate; and (ii) task-relevant metrics for guiding and evaluating ML models. Neglect of these factors has seriously hampered past ML work on malaria, because the resulting algorithms do not align with clinical needs. In this paper we address these two issues in the context of automated malaria diagnosis via microscopy on Giemsa-stained blood films. First, we describe why domain expertise is crucial to effectively apply ML to malaria, and list technical documents and other resources that provide this domain knowledge. Second, we detail performance metrics tailored to the clinical requirements of malaria diagnosis, to guide development of ML models and evaluate model performance through the lens of clinical needs (versus a generic ML lens). We highlight the importance of a patient-level perspective, interpatient variability, false positive rates, limit of detection, and different types of error. We also discuss reasons why ROC curves, AUC, and F1, as commonly used in ML work, are poorly suited to this context. These findings also apply to other diseases involving parasite loads, including neglected tropical diseases (NTDs) such as schistosomiasis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题