论文标题
在一般来源条件下脊(较少)回归的渐近学
Asymptotics of Ridge (less) Regression under General Source Condition
论文作者
论文摘要
我们分析了渐近状态下脊回归的预测误差,其中样本量和维度以成比例的速率转移到无穷大。特别是,我们考虑了真实回归参数的结构所起的作用。我们观察到,从结构化的先验中,可以将一般确定性参数的情况减少到随机参数的情况下。后一个假设是对非参数回归中经典平滑度假设的自然适应,这在反问题的正则化理论的背景下被称为源条件。粗略地说,我们假设参数的较大系数与主组件相对应。在此设置中,根据输入协方差和回归参数结构,获得了测试误差的精确表征。我们在简化的设置中说明了这种表征,以研究真实参数对过度参数化模型最佳正则化的影响。我们表明,即使使用有界的信号噪声比(SNR),插值(无正则化)也可以是最佳的,前提是,在数据的高方向方向上,参数系数在数据的高方向方向上更大,这对应于比正则化项更正常的函数。这与以前考虑到各向同性先验的脊回归的工作形成鲜明对比,在这种情况下,插值仅在无限SNR的极限上是最佳的。
We analyze the prediction error of ridge regression in an asymptotic regime where the sample size and dimension go to infinity at a proportional rate. In particular, we consider the role played by the structure of the true regression parameter. We observe that the case of a general deterministic parameter can be reduced to the case of a random parameter from a structured prior. The latter assumption is a natural adaptation of classic smoothness assumptions in nonparametric regression, which are known as source conditions in the the context of regularization theory for inverse problems. Roughly speaking, we assume the large coefficients of the parameter are in correspondence to the principal components. In this setting a precise characterisation of the test error is obtained, depending on the inputs covariance and regression parameter structure. We illustrate this characterisation in a simplified setting to investigate the influence of the true parameter on optimal regularisation for overparameterized models. We show that interpolation (no regularisation) can be optimal even with bounded signal-to-noise ratio (SNR), provided that the parameter coefficients are larger on high-variance directions of the data, corresponding to a more regular function than posited by the regularization term. This contrasts with previous work considering ridge regression with isotropic prior, in which case interpolation is only optimal in the limit of infinite SNR.