EHR死亡率预测的不确定性门控随机顺序模型

论文标题

EHR死亡率预测的不确定性门控随机顺序模型

Uncertainty-Gated Stochastic Sequential Model for EHR Mortality Prediction

论文作者

Jun, Eunji, Mulyadi, Ahmad Wisnu, Choi, Jaehun, Suk, Heung-Il

论文摘要

电子健康记录（EHR）的特征是非平稳，异质，嘈杂和稀疏数据。因此，学习其中固有的规律性或模式是一项挑战。特别是，主要由许多缺失的值引起的稀疏吸引了研究人员的注意，他们试图通过定义二次插补问题来更好地使用所有可用样本来确定主要目标任务的解决方案。从方法上讲，现有方法（确定性或随机性）已应用不同的假设来估算缺失值。但是，一旦估算了丢失的值，大多数现有方法就不会考虑下游任务建模中估算值的保真度或信心。毫无疑问，错误或不当的缺失变量的插补可能会导致建模和降级性能造成困难。在这项研究中，我们提出了一个新型的变性反复网络，该网络（i）估算了丢失变量的分布，允许在估计的值中表示不确定性，（（ii）通过基于复发过程中估计值的差异明确应用保真度来更新隐藏状态（即，不确定的播种时间不确定的时间），以及（IIII），以及（IIII）的可能性。值得注意的是，我们的模型可以单个流进行这些过程，并以端到端方式共同学习所有网络参数。我们使用模仿-III和Physionet挑战的公共数据集进行了与我们实验中考虑的死亡率预测的其他最先进的方法相比，我们验证了方法的有效性。此外，我们确定了模型的行为，该模型很好地表示了估计的估计值，这表明计算出的MAE与不确定性之间存在很高的相关性。

Electronic health records (EHR) are characterized as non-stationary, heterogeneous, noisy, and sparse data; therefore, it is challenging to learn the regularities or patterns inherent within them. In particular, sparseness caused mostly by many missing values has attracted the attention of researchers, who have attempted to find a better use of all available samples for determining the solution of a primary target task through the defining a secondary imputation problem. Methodologically, existing methods, either deterministic or stochastic, have applied different assumptions to impute missing values. However, once the missing values are imputed, most existing methods do not consider the fidelity or confidence of the imputed values in the modeling of downstream tasks. Undoubtedly, an erroneous or improper imputation of missing variables can cause difficulties in modeling as well as a degraded performance. In this study, we present a novel variational recurrent network that (i) estimates the distribution of missing variables allowing to represent uncertainty in the imputed values, (ii) updates hidden states by explicitly applying fidelity based on a variance of the imputed values during a recurrence (i.e., uncertainty propagation over time), and (iii) predicts the possibility of in-hospital mortality. It is noteworthy that our model can conduct these procedures in a single stream and learn all network parameters jointly in an end-to-end manner. We validated the effectiveness of our method using the public datasets of MIMIC-III and PhysioNet challenge 2012 by comparing with and outperforming other state-of-the-art methods for mortality prediction considered in our experiments. In addition, we identified the behavior of the model that well represented the uncertainties for the imputed estimates, which indicated a high correlation between the calculated MAE and the uncertainty.

下载PDF全文

下载文献需遵守相关版权规定

论文标题