论文标题
罕见事件的最佳COX回归子采样程序
Optimal Cox Regression Subsampling Procedure with Rare Events
论文作者
论文摘要
随着医疗保健行业的发展,庞大的生存数据集变得越来越普遍。在传统生存分析用例中,这些数据集构成了毫无前所未有的计算挑战。应对大量数据集的一种流行方式是将它们降低到更易于管理的规模,因此研究人员可以提供计算资源。 COX比例危害回归仍然是迄今为止分析生存数据的最流行的统计模型之一。这项工作解决了右事件的右审查和可能左截断的数据的设置,因此观察到的失败时间仅构成了整个样本的一小部分。我们提出了基于COX回归的基于子采样的估计器,该估计器通过为审查的观测值分配最佳抽样概率,并在分析中分配了所有观察到的故障,从而近似其全数据偏度。在适当的规律性条件下建立了所提出的估计量的渐近特性,并进行了仿真研究以评估估计器的有限样本性能。我们进一步将程序应用于英国 - 双okand癌症遗传和环境风险因素。
Massive sized survival datasets are becoming increasingly prevalent with the development of the healthcare industry. Such datasets pose computational challenges unprecedented in traditional survival analysis use-cases. A popular way for coping with massive datasets is downsampling them to a more manageable size, such that the computational resources can be afforded by the researcher. Cox proportional hazards regression has remained one of the most popular statistical models for the analysis of survival data to-date. This work addresses the settings of right censored and possibly left truncated data with rare events, such that the observed failure times constitute only a small portion of the overall sample. We propose Cox regression subsampling-based estimators that approximate their full-data partial-likelihood-based counterparts, by assigning optimal sampling probabilities to censored observations, and including all observed failures in the analysis. Asymptotic properties of the proposed estimators are established under suitable regularity conditions, and simulation studies are carried out to evaluate the finite sample performance of the estimators. We further apply our procedure on UK-biobank colorectal cancer genetic and environmental risk factors.