具有高维特征的双重强大的增强模型精度转移推断

论文标题

具有高维特征的双重强大的增强模型精度转移推断

Doubly Robust Augmented Model Accuracy Transfer Inference with High Dimensional Features

论文作者

Zhou, Doudou, Liu, Molei, Li, Mengyan, Cai, Tianxi

论文摘要

由于标签稀缺性和在现实研究中经常发生的协变量转移，转移学习已成为使用现有标记的源数据推广到某些目标人群的训练模型的必不可少的技术。大多数现有的转移学习研究都集中在模型估计上，而文献却很少在模型准确性的转移推论上，尽管其重要性很重要。我们提出了一个新颖的$ \ mathbf {d} $ oubly $ \ mathbf {r} $ obust $ \ mathbf {a} $ u Matched $ \ mathbf {m} $ odel $ \ mathbf {a} $ $ \ mathbf {i} $ nferen $ \ mathbf {c} $ e（戏剧性）方法，用于使用标记的源数据在未标记的目标人群中对未标记的目标总体中常用分类性能测量的间隔估算。具体而言，戏剧性衍生并评估目标人群的二进制响应$ y $的风险模型$ y $ $ y $ $ y $ \ mathbf {a} $，从源数据中仅利用$ y $从源数据中利用$ y $，从源数据和目标数据中都从源数据中利用$ y $。当正确指定至少一个模型并且某些模型的稀疏假设成立时，提出的估计器是双重鲁棒的，即它们是$ n^{1/2} $一致的。仿真结果表明，点估计的偏差可忽略不计，并且通过急剧达到令人满意的经验覆盖水平得出的置信区间。我们进一步说明了我们使用不同的抽样机制和不同时间点收集的群众总体杨百翰（MGB）中两种患者同类（MGB）中对II型糖尿病进行遗传风险预测模型的实用性及其对II型糖尿病的准确性评估。

Due to label scarcity and covariate shift happening frequently in real-world studies, transfer learning has become an essential technique to train models generalizable to some target populations using existing labeled source data. Most existing transfer learning research has been focused on model estimation, while there is a paucity of literature on transfer inference for model accuracy despite its importance. We propose a novel $\mathbf{D}$oubly $\mathbf{R}$obust $\mathbf{A}$ugmented $\mathbf{M}$odel $\mathbf{A}$ccuracy $\mathbf{T}$ransfer $\mathbf{I}$nferen$\mathbf{C}$e (DRAMATIC) method for point and interval estimation of commonly used classification performance measures in an unlabeled target population using labeled source data. Specifically, DRAMATIC derives and evaluates the risk model for a binary response $Y$ against some low dimensional predictors $\mathbf{A}$ on the target population, leveraging $Y$ from source data only and high dimensional adjustment features $\mathbf{X}$ from both the source and target data. The proposed estimators are doubly robust in the sense that they are $n^{1/2}$ consistent when at least one model is correctly specified and certain model sparsity assumptions hold. Simulation results demonstrate that the point estimation have negligible bias and the confidence intervals derived by DRAMATIC attain satisfactory empirical coverage levels. We further illustrate the utility of our method to transfer the genetic risk prediction model and its accuracy evaluation for type II diabetes across two patient cohorts in Mass General Brigham (MGB) collected using different sampling mechanisms and at different time points.

下载PDF全文

下载文献需遵守相关版权规定

论文标题