论文标题
在运行长度的编码字符串上缺乏最小的单词
Minimal Absent Words on Run-Length Encoded Strings
论文作者
论文摘要
如果没有发生$ w $(作为$ t $的子字符串),则称为另一个字符串$ t $的字符串$ W $称为另一个字符串$ t $的最小单词(MAW),并且在$ t $中出现任何适当的$ W $的符号。用于报告$ \ mathsf {maw}(t)的最新数据结构,从给定的字符串$ t的长度$ n $ of $ n $ requient $ o(n)$ space,可以用$ o(n)$时间来构建,并且可以在$ o(| \ m nathsf {maw}(maw}(maw}(maw}(t))$上,可以在$ o(n)$ time中构建。本文启动了从字符串的压缩表示中计算MAW的问题。特别是,我们专注于字符串,运行长度编码(RLE)的最基本的压缩表示,该表示代表相同字符$ a $ by $ a^p $的每个最大运行,其中$ p $ $ p $是运行的长度。令$ m $为字符串$ t $的rle大小。将MAWS分类为五个不相交集合之后$ \ Mathcal {m} _i $ for $ i = 1,2,4,5 $在rle-size $ m $方面,除了$ \ nathcal {m} _3 $,其大小不受$ m $不限制的_3 $。然后,我们提出一个紧凑型$ O(m)$ - 空间数据结构,可以在最佳$ o(| \ Mathsf {maw}(t)|)$ time中报告所有maws。
A string $w$ is called a minimal absent word (MAW) for another string $T$ if $w$ does not occur (as a substring) in $T$ and any proper substring of $w$ occurs in $T$. State-of-the-art data structures for reporting the set $\mathsf{MAW}(T)$ of MAWs from a given string $T$ of length $n$ require $O(n)$ space, can be built in $O(n)$ time, and can report all MAWs in $O(|\mathsf{MAW}(T)|)$ time upon a query. This paper initiates the problem of computing MAWs from a compressed representation of a string. In particular, we focus on the most basic compressed representation of a string, run-length encoding (RLE), which represents each maximal run of the same characters $a$ by $a^p$ where $p$ is the length of the run. Let $m$ be the RLE-size of string $T$. After categorizing the MAWs into five disjoint sets $\mathcal{M}_1$, $\mathcal{M}_2$, $\mathcal{M}_3$, $\mathcal{M}_4$, $\mathcal{M}_5$ using RLE, we present matching upper and lower bounds for the number of MAWs in $\mathcal{M}_i$ for $i = 1,2,4,5$ in terms of RLE-size $m$, except for $\mathcal{M}_3$ whose size is unbounded by $m$. We then present a compact $O(m)$-space data structure that can report all MAWs in optimal $O(|\mathsf{MAW}(T)|)$ time.