论文标题

在高维逻辑回归中的概率推断

Inference for the Case Probability in High-dimensional Logistic Regression

论文作者

Guo, Zijian, Rakshit, Prabrisha, Herman, Daniel S., Chen, Jinbo

论文摘要

在电子健康记录中标记患者患有疾病或病情的状态,即病例或控制状态,越来越多地使用源自结构化和非结构化电子健康记录数据的高维变量来依赖预测模型。目前的主要障碍是缺乏针对病例概率的有效统计推断方法。在本文中,考虑到预测的高维稀疏逻辑回归模型,我们通过开发线性化和方差增强技术提出了一种新的偏差校正估计量,以实现病例概率。我们为在高维度中的任何加载载体的估计量建立了渐近正态性。我们为病例概率构建置信区间,并提出了患者病例对照标记的假设测试程序。我们通过广泛的模拟研究和应用于现实世界电子健康记录数据的应用来证明所提出的方法。

Labeling patients in electronic health records with respect to their statuses of having a disease or condition, i.e. case or control statuses, has increasingly relied on prediction models using high-dimensional variables derived from structured and unstructured electronic health record data. A major hurdle currently is a lack of valid statistical inference methods for the case probability. In this paper, considering high-dimensional sparse logistic regression models for prediction, we propose a novel bias-corrected estimator for the case probability through the development of linearization and variance enhancement techniques. We establish asymptotic normality of the proposed estimator for any loading vector in high dimensions. We construct a confidence interval for the case probability and propose a hypothesis testing procedure for patient case-control labelling. We demonstrate the proposed method via extensive simulation studies and application to real-world electronic health record data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源