论文标题

约翰逊·林斯特劳斯(Johnson-Lindenstrauss)以外的欧几里得长度和距离

Approximate Euclidean lengths and distances beyond Johnson-Lindenstrauss

论文作者

Sobczyk, Aleksandros, Luisier, Mathieu

论文摘要

约翰逊(Johnson)和林登斯(Lindenstrauss)的经典结果指出,一组$ n $高维数据点可以预测到$ o(\ log n/ε^2)$尺寸,因此其成对距离的平方可保留至小失真$ε\ in(0,1)$。已经证明,JL引理对一般情况是最佳的,因此,只能在特殊情况下探索改进。这项工作旨在根据受Hutch ++算法启发的技术提高$ε^{ - 2} $依赖关系,该技术将$ε^{ - 2} $减少到$ε^{ - 2} $至$ε^{ - 1} $,用于隐式矩阵跟踪估计的相关问题。我们首先提出了一种算法,以估计矩阵行的欧几里得长度。我们证明了它的元素概率边界,在最坏情况下至少与标准JL近似值一样好,但对于具有衰减光谱的矩阵而言,渐近上的更好。此外,对于任何矩阵,无论其频谱如何,该算法都可以实现$ε$ - 准确性,总计,frobenius norm-norm-norm-norm-norm-norm-norm-norm-norm-norm-norm-norm-norm-norm-norm-norm-norm-norm-norm-norm-norm-norm-norm-nord-norm-norder误差都只能仅使用$ o(ε^{ - 1})$查询。这是对标准JL近似值的规范误差的二次改进。我们还展示了如何扩展这些结果以估算(i)数据点之间的欧几里得距离和(ii)高染成型数据矩阵的统计杠杆评分,这些矩阵对于许多应用而无处不在,并且具有类似的理论改进。提出了概念证明的数值实验以验证理论分析。

A classical result of Johnson and Lindenstrauss states that a set of $n$ high dimensional data points can be projected down to $O(\log n/ε^2)$ dimensions such that the square of their pairwise distances is preserved up to a small distortion $ε\in(0,1)$. It has been proved that the JL lemma is optimal for the general case, therefore, improvements can only be explored for special cases. This work aims to improve the $ε^{-2}$ dependency based on techniques inspired by the Hutch++ Algorithm, which reduces $ε^{-2}$ to $ε^{-1}$ for the related problem of implicit matrix trace estimation. We first present an algorithm to estimate the Euclidean lengths of the rows of a matrix. We prove for it element-wise probabilistic bounds that are at least as good as standard JL approximations in the worst-case, but are asymptotically better for matrices with decaying spectrum. Moreover, for any matrix, regardless of its spectrum, the algorithm achieves $ε$-accuracy for the total, Frobenius norm-wise relative error using only $O(ε^{-1})$ queries. This is a quadratic improvement over the norm-wise error of standard JL approximations. We also show how these results can be extended to estimate (i) the Euclidean distances between data points and (ii) the statistical leverage scores of tall-and-skinny data matrices, which are ubiquitous for many applications, with analogous theoretical improvements. Proof-of-concept numerical experiments are presented to validate the theoretical analysis.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源