论文标题

具有对数搜索时间的语法压缩索引

Grammar-Compressed Indexes with Logarithmic Search Time

论文作者

Claude, Francisco, Navarro, Gonzalo, Pacheco, Alejandro

论文摘要

令文本$ t [1..n] $为$ g $(终端和非终端)符号和尺寸$ g $(以规则右侧的长度的总和为衡量的$ g $(端子和非终端)符号)生成的唯一字符串。这样的语法,称为$ t $的语法压缩表示形式,可以使用$ g \ lg g $位编码。我们介绍了第一个使用$ o(g \ lg n)$位的语法压缩索引,并可以找到$ occ $ occ $ forters $ p [1..m] $ in time $ o(((m^2+occ)\ lg g)$。我们实施该索引,并在高度重复的文本收集中与艺术状态相比,证明其实用性。

Let a text $T[1..n]$ be the only string generated by a context-free grammar with $g$ (terminal and nonterminal) symbols, and of size $G$ (measured as the sum of the lengths of the right-hand sides of the rules). Such a grammar, called a grammar-compressed representation of $T$, can be encoded using essentially $G\lg g$ bits. We introduce the first grammar-compressed index that uses $O(G\lg n)$ bits and can find the $occ$ occurrences of patterns $P[1..m]$ in time $O((m^2+occ)\lg G)$. We implement the index and demonstrate its practicality in comparison with the state of the art, on highly repetitive text collections.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源