一种新型的可解释的机器学习系统，以产生临床风险评分：在回顾性队列研究中预测早期死亡率或计划外再入院的应用

论文标题

一种新型的可解释的机器学习系统，以产生临床风险评分：在回顾性队列研究中预测早期死亡率或计划外再入院的应用

A novel interpretable machine learning system to generate clinical risk scores: An application for predicting early mortality or unplanned readmission in a retrospective cohort study

论文作者

Ning, Yilin, Li, Siqi, Ong, Marcus Eng Hock, Xie, Feng, Chakraborty, Bibhas, Ting, Daniel Shu Wei, Liu, Nan

论文摘要

风险评分广泛用于临床决策，通常是由逻辑回归模型产生的。基于机器学习的方法可以很好地识别重要的预测因子，但是这种“黑匣子”可变选择限制了可解释性，并且从单个模型中评估的可变重要性可能会偏见。我们使用最近开发的shapley变量重要性云（Shapleyvic）提出了一种可靠且可解释的变量选择方法，该方法构成了跨模型的可变性。我们的方法评估并可视化了深入推理和透明变量选择的总体变量贡献，并滤除了简化模型构建步骤的非重要贡献者。我们从变量贡献中得出一个集合变量排名，该排名很容易与自动化和模块化的风险分数生成器Autoscore集成，以方便实现。在对早期死亡或计划外再入院的研究中，Shapleyvic选择了41个候选变量中的6个，以创建一个良好的模型，该模型与基于机器学习的排名中的16变量模型具有相似的性能。

Risk scores are widely used for clinical decision making and commonly generated from logistic regression models. Machine-learning-based methods may work well for identifying important predictors, but such 'black box' variable selection limits interpretability, and variable importance evaluated from a single model can be biased. We propose a robust and interpretable variable selection approach using the recently developed Shapley variable importance cloud (ShapleyVIC) that accounts for variability across models. Our approach evaluates and visualizes overall variable contributions for in-depth inference and transparent variable selection, and filters out non-significant contributors to simplify model building steps. We derive an ensemble variable ranking from variable contributions, which is easily integrated with an automated and modularized risk score generator, AutoScore, for convenient implementation. In a study of early death or unplanned readmission, ShapleyVIC selected 6 of 41 candidate variables to create a well-performing model, which had similar performance to a 16-variable model from machine-learning-based ranking.

下载PDF全文

下载文献需遵守相关版权规定

论文标题