论文标题

快速交叉验证多骨脊回归

Fast cross-validation for multi-penalty ridge regression

论文作者

van de Wiel, Mark A., van Nee, Mirrelijn M., Rauschenberger, Armin

论文摘要

具有多种数据类型的高维预测需要说明预测信号的潜在差异。 Ridge回归是高维数据的简单模型,它挑战了许多更复杂的模型和学习者的预测性能,并允许包含数据类型的特定惩罚。多重生山脊的最大挑战是在交叉验证(CV)设置中有效地优化这些惩罚,尤其是GLM和Cox Ridge回归,这需要通过迭代加权最小二乘(IWLS)进行额外的估计环。我们的主要贡献是IWLS算法中使用的多型,样品加权HAT-MAT​​RIX的计算非常有效的公式。结果,几乎所有计算都在低维空间中,从而加速了几个数量级。我们开发了一个灵活的框架,该框架促进了多种类型的响应,未确定的协变量,几个性能标准和重复的简历。包括对几种癌症基因组生存预测问题的配对和优先数据类型的扩展。此外,我们提出了类似的计算快捷方式,以实现最大边缘可能性和贝叶斯概率回归。相应的R包装Multiridge是一种通用的独立工具,也是其他更复杂的模型和多视图学习者的快速基准。

High-dimensional prediction with multiple data types needs to account for potentially strong differences in predictive signal. Ridge regression is a simple model for high-dimensional data that has challenged the predictive performance of many more complex models and learners, and that allows inclusion of data type specific penalties. The largest challenge for multi-penalty ridge is to optimize these penalties efficiently in a cross-validation (CV) setting, in particular for GLM and Cox ridge regression, which require an additional estimation loop by iterative weighted least squares (IWLS). Our main contribution is a computationally very efficient formula for the multi-penalty, sample-weighted hat-matrix, as used in the IWLS algorithm. As a result, nearly all computations are in low-dimensional space, rendering a speed-up of several orders of magnitude. We developed a flexible framework that facilitates multiple types of response, unpenalized covariates, several performance criteria and repeated CV. Extensions to paired and preferential data types are included and illustrated on several cancer genomics survival prediction problems. Moreover, we present similar computational shortcuts for maximum marginal likelihood and Bayesian probit regression. The corresponding R-package, multiridge, serves as a versatile standalone tool, but also as a fast benchmark for other more complex models and multi-view learners.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源