论文标题

降低的仿冒品

Derandomizing Knockoffs

论文作者

Ren, Zhimei, Wei, Yuting, Candès, Emmanuel

论文摘要

Model-X仿基是一个一般过程,可以利用任何特征重要性度量来产生可变选择算法,该算法在严格控制误报的数量或分数的同时发现真实效果。 Model-X仿基是一个随机过程,依赖于合成(随机)变量的一次性构造。本文通过在仿冒算法的多个运行中汇总选择结果来介绍一种降低方法。降低的步骤设计为灵活,并且可以适应任何可变选择基础程序,以产生稳定的决策,而不会损害统计能力。当应用于Janson等人的基础程序时。 (2016年),我们证明,降低的仿冒品控制着每个家庭错误率(PFER)和K家庭错误率(K-FWER)。此外,我们进行了广泛的数值研究,证明了I型误差控制严格,并且与替代变量选择算法相比,功率显着增强。最后,我们将方法应用于前列腺癌的多阶段基因组关联研究,并报告与该疾病显着相关的基因组的位置。当与其他研究交叉引用时,我们发现报告的关联已被复制。

Model-X knockoffs is a general procedure that can leverage any feature importance measure to produce a variable selection algorithm, which discovers true effects while rigorously controlling the number or fraction of false positives. Model-X knockoffs is a randomized procedure which relies on the one-time construction of synthetic (random) variables. This paper introduces a derandomization method by aggregating the selection results across multiple runs of the knockoffs algorithm. The derandomization step is designed to be flexible and can be adapted to any variable selection base procedure to yield stable decisions without compromising statistical power. When applied to the base procedure of Janson et al. (2016), we prove that derandomized knockoffs controls both the per family error rate (PFER) and the k family-wise error rate (k-FWER). Further, we carry out extensive numerical studies demonstrating tight type-I error control and markedly enhanced power when compared with alternative variable selection algorithms. Finally, we apply our approach to multi-stage genome-wide association studies of prostate cancer and report locations on the genome that are significantly associated with the disease. When cross-referenced with other studies, we find that the reported associations have been replicated.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源