论文标题
稀疏回归中有条件的不相关和有效的非冠状子集选择
Conditional Uncorrelation and Efficient Non-approximate Subset Selection in Sparse Regression
论文作者
论文摘要
给定$ m $ $ d $ d $ - 二维供应符和$ n $ d $ d $二维预测变量,稀疏回归最多可以找到线性近似的每个响应者的$ k $预测指标,$ 1 \ leq k \ leq leq d-1 $。稀疏回归中的关键问题是子集选择,通常遭受高计算成本。近年来,已经发布了许多改进的子集选择方法。但是,人们对非同样的子集选择方法的关注较少,这对于数据分析中的许多问题非常必要。在这里,我们从相关的角度考虑稀疏回归,并提出条件不相关的公式。然后,提出了一种有效的非同一子集选择方法,其中我们不需要计算回归方程中的任何系数的候选预测因子。通过建议的方法,计算复杂性从$ O(\ frac {1} {6} {k^3}+mk^2+mkd)$降至$ o(\ frac {1} {6} {6} {k^3}+\ frac {1} {1} {1} {2} {2} {2} mk^2)由于尺寸$ d $通常是观测值或实验的数量,并且足够大,因此所提出的方法可以大大提高非同一子集选择的效率。
Given $m$ $d$-dimensional responsors and $n$ $d$-dimensional predictors, sparse regression finds at most $k$ predictors for each responsor for linear approximation, $1\leq k \leq d-1$. The key problem in sparse regression is subset selection, which usually suffers from high computational cost. Recent years, many improved approximate methods of subset selection have been published. However, less attention has been paid on the non-approximate method of subset selection, which is very necessary for many questions in data analysis. Here we consider sparse regression from the view of correlation, and propose the formula of conditional uncorrelation. Then an efficient non-approximate method of subset selection is proposed in which we do not need to calculate any coefficients in regression equation for candidate predictors. By the proposed method, the computational complexity is reduced from $O(\frac{1}{6}{k^3}+mk^2+mkd)$ to $O(\frac{1}{6}{k^3}+\frac{1}{2}mk^2)$ for each candidate subset in sparse regression. Because the dimension $d$ is generally the number of observations or experiments and large enough, the proposed method can greatly improve the efficiency of non-approximate subset selection.