论文标题
半监督转移学习用于评估模型分类性能
Semi-supervised Transfer Learning for Evaluation of Model Classification Performance
论文作者
论文摘要
在现代的机器学习应用中,经常发生协变量转移和标签稀缺性的相遇为强大的模型培训和评估带来了挑战。已经开发了许多转移学习方法,以使用源总数中现有的标记数据将模型本身稳健地使模型本身适应一些未标记的目标人群。但是,关于转移经过训练的模型的性能指标的文献很少。在本文中,我们旨在根据基于接收器操作特征(ROC)分析的未标记目标人群评估训练有素的二进制分类器的性能。我们提出了$ \ bf s $ emi监督的$ \ bf t $ ransfer l $ \ bf e $ $ $ \ bf a $ \ bf a $ ccuracy $ \ bf m $ sueseres(steam)(蒸汽),这是一个有效的三步估计程序,该过程效率为1),以构建型号的型号,以构建型号的型号,以改善型号的型号,以改善型号的型号。 效率。我们在正确规范密度比模型或结果模型的正确规范下建立了所提出的估计量的一致性和渐近态性。我们还纠正了具有交叉验证的有限样品中估计量中的潜在过度拟合偏差。我们将提出的估计器与现有方法进行比较,并通过模拟显示偏见和提高的降低。我们说明了所提出的方法在评估临时EHR队列上类风湿关节炎(RA)的表型模型的预测性能的实际实用性。
In modern machine learning applications, frequent encounters of covariate shift and label scarcity have posed challenges to robust model training and evaluation. Numerous transfer learning methods have been developed to robustly adapt the model itself to some unlabeled target populations using existing labeled data in a source population. However, there is a paucity of literature on transferring performance metrics of a trained model. In this paper, we aim to evaluate the performance of a trained binary classifier on unlabeled target population based on receiver operating characteristic (ROC) analysis. We proposed $\bf S$emi-supervised $\bf T$ransfer l$\bf E$arning of $\bf A$ccuracy $\bf M$easures (STEAM), an efficient three-step estimation procedure that employs 1) double-index modeling to construct calibrated density ratio weights and 2) robust imputation to leverage the large amount of unlabeled data to improve estimation efficiency. We establish the consistency and asymptotic normality of the proposed estimators under correct specification of either the density ratio model or the outcome model. We also correct for potential overfitting bias in the estimators in finite samples with cross-validation. We compare our proposed estimators to existing methods and show reductions in bias and gains in efficiency through simulations. We illustrate the practical utility of the proposed method on evaluating prediction performance of a phenotyping model for Rheumatoid Arthritis (RA) on a temporally evolving EHR cohort.