论文标题

约翰逊·林顿斯(Johnson-Lindenstrauss)嵌入嘈杂的矢量 - 利用噪音

Johnson-Lindenstrauss embeddings for noisy vectors -- taking advantage of the noise

论文作者

Shao, Zhen

论文摘要

本文研究了近次采样和哈希的理论特性,作为近似欧几里得规范固定嵌入的工具,用于具有(未知)加性高斯噪音的向量。由于其著名的引理,这种嵌入有时被称为Johnson-Lindenstrauss嵌入。先前的工作表明,作为稀疏的嵌入,子采样和哈希的成功密切取决于$ l_ \ infty $至$ l_2 $ $ l_2 $的比率。本文表明,噪声的存在消除了高度限制的约束,换句话说,稀疏的嵌入(例如亚二键和哈希)具有与密集嵌入的可比嵌入尺寸相似的近似近似norm norm norm norm norm norm norm norm norm norm norm norm normentive dimensiveality降低特性。关键是应将噪声视为要利用的信息,而不仅仅是要删除的东西。在存在噪声的情况下,在存在噪声的情况下恢复高维矢量的近似规范的理论界限和散列的数值插图显示在存在噪声的情况下可以实现更好的性能。

This paper investigates theoretical properties of subsampling and hashing as tools for approximate Euclidean norm-preserving embeddings for vectors with (unknown) additive Gaussian noises. Such embeddings are sometimes called Johnson-lindenstrauss embeddings due to their celebrated lemma. Previous work shows that as sparse embeddings, the success of subsampling and hashing closely depends on the $l_\infty$ to $l_2$ ratios of the vector to be mapped. This paper shows that the presence of noise removes such constrain in high-dimensions, in other words, sparse embeddings such as subsampling and hashing with comparable embedding dimensions to dense embeddings have similar approximate norm-preserving dimensionality-reduction properties. The key is that the noise should be treated as an information to be exploited, not simply something to be removed. Theoretical bounds for subsampling and hashing to recover the approximate norm of a high dimension vector in the presence of noise are derived, with numerical illustrations showing better performances are achieved in the presence of noise.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源