接近输入的稀疏时间内核通过自适应采样

论文标题

接近输入的稀疏时间内核通过自适应采样

Near Input Sparsity Time Kernel Embeddings via Adaptive Sampling

论文作者

Woodruff, David P., Zandieh, Amir

论文摘要

为了加速内核方法，我们提出了一种接近输入的稀疏时间算法，用于对核转换隐含定义的高维特征空间进行采样。我们的主要贡献是一种重要的采样方法，用于在几乎输入稀疏时间中缩减数据点的特征空间$ q $张紧，从而改善了最近的遗忘素描方法（Ahle等，2020）的$ q^{5/2}/ε^2 $。这导致多项式内核以及高斯内核的子空间嵌入，目标维度仅线性地依赖于内核的统计维度，并且在及时仅依赖于输入数据集的稀疏性。我们展示了我们的子空间嵌入界限如何意味着内核岭回归的新统计保证。此外，我们从经验上表明，在大规模回归任务中，我们的算法优于最先进的内核近似方法。

To accelerate kernel methods, we propose a near input sparsity time algorithm for sampling the high-dimensional feature space implicitly defined by a kernel transformation. Our main contribution is an importance sampling method for subsampling the feature space of a degree $q$ tensoring of data points in almost input sparsity time, improving the recent oblivious sketching method of (Ahle et al., 2020) by a factor of $q^{5/2}/ε^2$. This leads to a subspace embedding for the polynomial kernel, as well as the Gaussian kernel, with a target dimension that is only linearly dependent on the statistical dimension of the kernel and in time which is only linearly dependent on the sparsity of the input dataset. We show how our subspace embedding bounds imply new statistical guarantees for kernel ridge regression. Furthermore, we empirically show that in large-scale regression tasks, our algorithm outperforms state-of-the-art kernel approximation methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题