论文标题

与Mersenne Primes一起使用的力量

The Power of Hashing with Mersenne Primes

论文作者

Ahle, Thomas Dybdahl, Knudsen, Jakob Tejs Bæk, Thorup, Mikkel

论文摘要

计算$ k $ - 宇宙哈希功能的经典方法是使用随机度 - $(k-1)$ polyenmial在prime字段$ \ mathbb z_p $上。为了快速计算多项式,通常选择Prime $ P $作为Mersenne Prime $ P = 2^B-1 $。 在本文中,我们表明使用Mersenne Primes还有其他优势。我们的观点是,哈希函数的输出是$ b $ bit整数,在$ \ {0,\ dots,2^b-1 \} $中均匀分布,除了$ p $(所有\ texttt1s in yinary In二进制中)缺少。均匀的钻头字符串具有许多不错的特性,例如分解为子字符串,这为我们提供了两个或多个哈希功能的成本,同时保留了强大的理论品质。我们将此技巧称为“两个”哈希,并在经典计数示意算法中以4-宇宙散列进行了证明。 我们还为Mersenne Primes提供了新的无分支机构代码和模量。与我们的分析工作形成鲜明对比的是,该代码概括为小$ c $的任何伪 - 梅森·普里姆斯$ p = 2^b-c $。

The classic way of computing a $k$-universal hash function is to use a random degree-$(k-1)$ polynomial over a prime field $\mathbb Z_p$. For a fast computation of the polynomial, the prime $p$ is often chosen as a Mersenne prime $p=2^b-1$. In this paper, we show that there are other nice advantages to using Mersenne primes. Our view is that the hash function's output is a $b$-bit integer that is uniformly distributed in $\{0, \dots, 2^b-1\}$, except that $p$ (the all \texttt1s value in binary) is missing. Uniform bit strings have many nice properties, such as splitting into substrings which gives us two or more hash functions for the cost of one, while preserving strong theoretical qualities. We call this trick "Two for one" hashing, and we demonstrate it on 4-universal hashing in the classic Count Sketch algorithm for second-moment estimation. We also provide a new fast branch-free code for division and modulus with Mersenne primes. Contrasting our analytic work, this code generalizes to any Pseudo-Mersenne primes $p=2^b-c$ for small $c$.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源