论文标题

信封和主要组件回归

Envelopes and principal component regression

论文作者

Zhang, Xin, Deng, Kai, Mai, Qing

论文摘要

包络方法为各种模型提供了靶向尺寸。总体目标是通过将数据投影到较低维度的子空间(称为信封)上来提高多元参数估计的效率。包络方法在分析具有高度相关变量的数据方面具有优势,但是它们的迭代Grassmannian优化算法与超高度数据的扩展不是很好。尽管在多元线性回归中的信封与部分最小二乘之间的连接促进了信封的高维研究中的最新进展,但我们提出了一种从新颖的主成分回归观点的更直接的包络模型。所提出的程序,非著作包络成分估计(Niece),比迭代的格拉曼尼亚优化替代方案具有出色的计算优势。我们开发了一种统一的侄女理论,该理论弥合了信封方法与回归中主要成分之间的差距。新的理论见解还阐明了包络子空间估计误差,这是两个用于包膜建模中使用的两个对称阳性确定矩阵的特征值间隙的函数。我们将新理论和算法应用于多种包膜模型,包括多元线性模型的响应和预测变量,逻辑回归和COX比例危害模型。模拟和说明性数据分析表明,侄女显着改善线性和广义线性模型中的标准方法的潜力。

Envelope methods offer targeted dimension reduction for various models. The overarching goal is to improve efficiency in multivariate parameter estimation by projecting the data onto a lower-dimensional subspace known as the envelope. Envelope approaches have advantages in analyzing data with highly correlated variables, but their iterative Grassmannian optimization algorithms do not scale very well with ultra high-dimensional data. While the connections between envelopes and partial least squares in multivariate linear regression have promoted recent progress in high-dimensional studies of envelopes, we propose a more straightforward way of envelope modeling from a novel principal components regression perspective. The proposed procedure, Non-Iterative Envelope Component Estimation (NIECE), has excellent computational advantages over the iterative Grassmannian optimization alternatives in high dimensions. We develop a unified NIECE theory that bridges the gap between envelope methods and principal components in regression. The new theoretical insights also shed light on the envelope subspace estimation error as a function of eigenvalue gaps of two symmetric positive definite matrices used in envelope modeling. We apply the new theory and algorithm to several envelope models, including response and predictor reduction in multivariate linear models, logistic regression, and Cox proportional hazard model. Simulations and illustrative data analysis show the potential for NIECE to improve standard methods in linear and generalized linear models significantly.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源