论文标题
高斯工艺在平坦极限中的回归
Gaussian Process Regression in the Flat Limit
论文作者
论文摘要
高斯流程(GP)回归是贝叶斯统计的基本工具。它也被称为Kriging,是常见者内核脊回归的贝叶斯对应物。大多数关于GP回归的理论工作都集中在大$ n $渐近器上,即随着数据量的增加。在简单情况下,例如常规网格的位置,固定样本分析要困难得多。在这项工作中,我们执行了固定样本分析,该分析最初是在Driscoll&Fornberg(2002)的近似理论背景下研究的,称为``Flat Limit''。在平面渐近学中,目标是将内核方法表征为内核函数的长度尺度趋于无穷大,因此内核在数据范围内显得平坦。令人惊讶的是,该极限是明确的,并且显示出有趣的行为:Driscoll&Fornberg表明,如果内核是高斯,则径向基插值在平坦的极限中收敛到多项式插值。随后的工作表明,这在多变量设置中也是如此,但是高斯以外的其他内核可能具有(多谐波)的花纹作为极限插值。 利用固定限制内核矩阵的光谱行为的最新结果,我们研究了高斯过程回归的固定限制。结果表明,根据内核,高斯过程回归趋于(多元)多项式回归的平坦极限或(多谐波)样条回归。重要的是,这对于预测平均值和预测差异都构成,因此后验预测分布变得相等。我们的结果会带来实际的后果:例如,他们表明,在非常大的长度尺度上可能会出现最佳的GP预测,这对于当前的实现是由于数值困难而看不见的。
Gaussian process (GP) regression is a fundamental tool in Bayesian statistics. It is also known as kriging and is the Bayesian counterpart to the frequentist kernel ridge regression. Most of the theoretical work on GP regression has focused on a large-$n$ asymptotics, i.e. as the amount of data increases. Fixed-sample analysis is much more difficult outside of simple cases, such as locations on a regular grid. In this work we perform a fixed-sample analysis that was first studied in the context of approximation theory by Driscoll & Fornberg (2002), called the ``flat limit''. In flat-limit asymptotics, the goal is to characterise kernel methods as the length-scale of the kernel function tends to infinity, so that kernels appear flat over the range of the data. Surprisingly, this limit is well-defined, and displays interesting behaviour: Driscoll & Fornberg showed that radial basis interpolation converges in the flat limit to polynomial interpolation, if the kernel is Gaussian. Subsequent work showed that this holds true in the multivariate setting as well, but that kernels other than the Gaussian may have (polyharmonic) splines as the limit interpolant. Leveraging recent results on the spectral behaviour of kernel matrices in the flat limit, we study the flat limit of Gaussian process regression. Results show that Gaussian process regression tends in the flat limit to (multivariate) polynomial regression, or (polyharmonic) spline regression, depending on the kernel. Importantly, this holds for both the predictive mean and the predictive variance, so that the posterior predictive distributions become equivalent. Our results have practical consequences: for instance, they show that optimal GP predictions in the sense of leave-one-out loss may occur at very large length-scales, which would be invisible to current implementations because of numerical difficulties.