论文标题
通过空间填充基础选择的平滑花纹的更有效近似
More efficient approximation of smoothing splines via space-filling basis selection
论文作者
论文摘要
我们考虑在非参数回归模型中近似平滑样条估计器的问题。当应用于尺寸$ n $的样本时,平滑样条估计器可以表示为$ n $基础函数的线性组合,需要$ o(n^3)$计算时间$ d \ geq 2 $。如此巨大的计算成本阻碍了平滑光谱的广泛适用性。实际上,基于$ q $随机选择的基础功能的估算器可以近似整个样品平滑样条估计器,从而导致$ O(NQ^2)$的计算成本。众所周知,当$ q $的订单$ o \ {n^{2/(pr+1)} \} $时,这两个估计器以相同的速率收敛,其中$ p \ in [1,2] $取决于真实函数$η$,而$ r> 1 $取决于样式类型。这种$ Q $称为基本函数的基本数量。在本文中,我们开发了一种更有效的基础选择方法。通过选择对应于大致间隔的观测值的方法,该方法选择了一组具有较大多样性的基础函数。渐近分析表明,当$ d \ leq pr+1 $时,我们提出的平滑样条估计器可以将$ q $减少至大约$ o \ {n^{1/(pr+1)} \} $。合成和现实数据集的应用程序显示,与其他基础选择方法相比,所提出的方法导致较小的预测错误。
We consider the problem of approximating smoothing spline estimators in a nonparametric regression model. When applied to a sample of size $n$, the smoothing spline estimator can be expressed as a linear combination of $n$ basis functions, requiring $O(n^3)$ computational time when the number of predictors $d\geq 2$. Such a sizable computational cost hinders the broad applicability of smoothing splines. In practice, the full sample smoothing spline estimator can be approximated by an estimator based on $q$ randomly-selected basis functions, resulting in a computational cost of $O(nq^2)$. It is known that these two estimators converge at the identical rate when $q$ is of the order $O\{n^{2/(pr+1)}\}$, where $p\in [1,2]$ depends on the true function $η$, and $r > 1$ depends on the type of spline. Such $q$ is called the essential number of basis functions. In this article, we develop a more efficient basis selection method. By selecting the ones corresponding to roughly equal-spaced observations, the proposed method chooses a set of basis functions with a large diversity. The asymptotic analysis shows our proposed smoothing spline estimator can decrease $q$ to roughly $O\{n^{1/(pr+1)}\}$, when $d\leq pr+1$. Applications on synthetic and real-world datasets show the proposed method leads to a smaller prediction error compared with other basis selection methods.