通过非平稳性稀疏性核心的大规模数据集的确切高斯流程

论文标题

通过非平稳性稀疏性核心的大规模数据集的确切高斯流程

Exact Gaussian Processes for Massive Datasets via Non-Stationary Sparsity-Discovering Kernels

论文作者

Noack, Marcus M., Krishnan, Harinarayan, Risser, Mark D., Reyes, Kristofer G.

论文摘要

高斯工艺（GP）是科学和工程应用中随机函数近似的突出数学框架。这种成功在很大程度上归因于GP的分析性障碍，鲁棒性，非参数结构以及不确定性定量的自然包含。不幸的是，由于大型数据集的使用非常昂贵，因为它们在计算中的数值复杂性不佳（n^3）$，而存储中的$ O（n^2）$。解决此问题的所有现有方法均使用某种形式的近似值 - 通常考虑完整数据集的子集或寻找代表性的伪点，以使协方差矩阵结构良好且稀疏。这些近似方法可能导致功能近似值不准确，并且通常会限制用户在设计表达内核方面的灵活性。我们建议通过允许内核发现（而不是诱导）稀疏结构来利用自然存在的稀疏性，而不是通过数据点几何形状和结构引起稀疏性。本文的前提是，以最本地形式的GPS通常是稀疏的，但通常使用的内核不允许我们利用这种稀疏性。精确的核心概念，同时稀疏的GPS取决于内核定义，这些定义提供了足够的灵活性，不仅可以学习和编码非零，而且还零协方差。这项超虚拟，紧凑型和非平稳核的原理与HPC结合和约束优化，使我们能够将精确的GPS扩展到500万个数据点以上。

A Gaussian Process (GP) is a prominent mathematical framework for stochastic function approximation in science and engineering applications. This success is largely attributed to the GP's analytical tractability, robustness, non-parametric structure, and natural inclusion of uncertainty quantification. Unfortunately, the use of exact GPs is prohibitively expensive for large datasets due to their unfavorable numerical complexity of $O(N^3)$ in computation and $O(N^2)$ in storage. All existing methods addressing this issue utilize some form of approximation -- usually considering subsets of the full dataset or finding representative pseudo-points that render the covariance matrix well-structured and sparse. These approximate methods can lead to inaccuracies in function approximations and often limit the user's flexibility in designing expressive kernels. Instead of inducing sparsity via data-point geometry and structure, we propose to take advantage of naturally-occurring sparsity by allowing the kernel to discover -- instead of induce -- sparse structure. The premise of this paper is that GPs, in their most native form, are often naturally sparse, but commonly-used kernels do not allow us to exploit this sparsity. The core concept of exact, and at the same time sparse GPs relies on kernel definitions that provide enough flexibility to learn and encode not only non-zero but also zero covariances. This principle of ultra-flexible, compactly-supported, and non-stationary kernels, combined with HPC and constrained optimization, lets us scale exact GPs well beyond 5 million data points.

下载PDF全文

下载文献需遵守相关版权规定

论文标题