语法压缩和概率无上下文语法

论文标题

语法压缩和概率无上下文语法

Grammar compression with probabilistic context-free grammar

论文作者

Naganuma, Hiroaki, Hendrian, Diptarama, Yoshinaka, Ryo, Shinohara, Ayumi, Kobayashi, Naoki

论文摘要

我们提出了一种基于语法压缩的通用无损文本压缩的新方法。在文献中，目标字符串$ t $已被压缩为无上下文的chomsky普通形式，满足$ l（g）= \ {t \} $。这样的语法通常称为\ emph {直线程序}（SLP）。在本文中，我们考虑了产生$ t $的概率语法$ g $，但不一定是$ l（g）$的独特元素。为了明确地恢复原始文本$ t $，我们同时将$ t $的语法$ g $和$ t $的派生树保留在$ g $中的$ g $，以压缩形式。我们表明了一些简单的证据表明，从理论和实际观点来看，对于某些文本，我们的建议确实比SLP更有效。

We propose a new approach for universal lossless text compression, based on grammar compression. In the literature, a target string $T$ has been compressed as a context-free grammar $G$ in Chomsky normal form satisfying $L(G) = \{T\}$. Such a grammar is often called a \emph{straight-line program} (SLP). In this paper, we consider a probabilistic grammar $G$ that generates $T$, but not necessarily as a unique element of $L(G)$. In order to recover the original text $T$ unambiguously, we keep both the grammar $G$ and the derivation tree of $T$ from the start symbol in $G$, in compressed form. We show some simple evidence that our proposal is indeed more efficient than SLPs for certain texts, both from theoretical and practical points of view.

下载PDF全文

下载文献需遵守相关版权规定

论文标题