结构随机选择

论文标题

结构随机选择

Structural randomised selection

论文作者

Wang, Fan, Richardson, Sylvia, Hill, Steven M.

论文摘要

高维度数据数据分析中的一个重要问题是识别与感兴趣表型相关的分子变量的子集。这需要解决高维，强大的多重共线性和模型不确定性的挑战。我们提出了一种新的合奏学习方法，用于提高稀疏惩罚回归方法的性能，称为结构随机选择（链）。该方法由随机拉索方法建立和改进，包括两个步骤。在这两个步骤中，我们通过重复的变量子采样来降低维数。我们将惩罚性回归方法应用于每个子采样数据集并平均结果。在第一步中，子采样是通过可变相关结构和第二步中的，从第一步开始的重要性度量。可以将链与任何稀疏的惩罚回归方法一起用作“基础学习者”。使用合成数据和实际生物数据集，我们证明了链通常会在其基础学习者身上改善，并且考虑到第一步中的相关结构可以帮助提高探索模型空间的效率。

An important problem in the analysis of high-dimensional omics data is to identify subsets of molecular variables that are associated with a phenotype of interest. This requires addressing the challenges of high dimensionality, strong multicollinearity and model uncertainty. We propose a new ensemble learning approach for improving the performance of sparse penalised regression methods, called STructural RANDomised Selection (STRANDS). The approach, that builds and improves upon the Random Lasso method, consists of two steps. In both steps, we reduce dimensionality by repeated subsampling of variables. We apply a penalised regression method to each subsampled dataset and average the results. In the first step, subsampling is informed by variable correlation structure, and in the second step, by variable importance measures from the first step. STRANDS can be used with any sparse penalised regression approach as the "base learner". Using synthetic data and real biological datasets, we demonstrate that STRANDS typically improves upon its base learner, and that taking account of the correlation structure in the first step can help to improve the efficiency with which the model space may be explored.

下载PDF全文

下载文献需遵守相关版权规定

论文标题