论文标题
SLP压缩文档的恒定枚举枚举
Constant-delay enumeration for SLP-compressed documents
论文作者
论文摘要
我们研究了通过压缩文档查询的枚举结果的问题。我们用于压缩的模型是直线程序(SLP),这些程序由产生单个字符串的无上下文语法定义。对于我们的查询,我们使用一个称为注释的自动机的模型,这是常规自动机的扩展,该模型允许在字母上注释。该模型扩展了常规跨度的概念,因为它允许任意长输出。我们的主要结果是一种算法,该算法通过在预处理阶段以输出线性延迟来枚举所有结果来评估此类查询,该阶段在SLP的大小上需要线性时间,而在自动机的大小上需要线性时间。这是对Schmid和Schweikardt的结果的改进,在相同的预处理时间内,列举的延迟与未压缩文档的大小对数具有对数。我们通过名为“枚举紧凑型集合”的持久数据结构实现了这一目标,该集合可以保证在某些限制下输出线性延迟。这些结果意味着在常规跨越的背景下,恒定的枚举算法。此外,我们使用带注释的自动机的扩展,该扩展利用了简洁的编码注释来节省指数因子,从以前的结果中介绍了涉及恒定的枚举枚举超过VSET自动机。最后,我们以相同的方式扩展了结果,而Schweikardt则允许复杂的文档编辑,同时保持持续的延迟保证。
We study the problem of enumerating results from a query over a compressed document. The model we use for compression are straight-line programs (SLPs), which are defined by a context-free grammar that produces a single string. For our queries, we use a model called Annotated Automata, an extension of regular automata that allows annotations on letters. This model extends the notion of Regular Spanners as it allows arbitrarily long outputs. Our main result is an algorithm that evaluates such a query by enumerating all results with output-linear delay after a preprocessing phase which takes linear time on the size of the SLP, and cubic time over the size of the automaton. This is an improvement over Schmid and Schweikardt's result, which, with the same preprocessing time, enumerates with a delay that is logarithmic on the size of the uncompressed document. We achieve this through a persistent data structure named Enumerable Compact Sets with Shifts which guarantees output-linear delay under certain restrictions. These results imply constant-delay enumeration algorithms in the context of regular spanners. Further, we use an extension of annotated automata which utilizes succinctly encoded annotations to save an exponential factor from previous results that dealt with constant-delay enumeration over vset automata. Lastly, we extend our results in the same fashion Schmid and Schweikardt did to allow complex document editing while maintaining the constant delay guarantee.