论文标题

编辑距离和LCS的不对称流算法

Asymmetric Streaming Algorithms for Edit Distance and LCS

论文作者

Farhadi, Alireza, Hajiaghayi, MohammadTaghi, Rubinstein, Aviad, Seddighin, Saeed

论文摘要

编辑距离(ED)和最长的常见子序列(LCS)是两个基本问题,它们量化了两个字符串彼此相似的方式。在本文中,我们在Andoni等人引入的不对称流模型中考虑了这些问题。 (焦点10)和saks和seshadhri(苏打13)。在此模型中,我们可以随机访问一个字符串,并流式传输访问另一个字符串。我们的主要贡献是ED的常数因子近似算法,对于任何常数$δ> 0 $,具有$ \ tilde o(n^δ)$的内存。除此之外,我们还向$ \ tildeo_ε(\ sqrt {n})$的上限介绍了在$ 1+ε$中近似ED或LCS所需的内存。我们所有的算法都是确定性的,并在单个通行证中运行。 对于在恒定因素中近似ED,我们发现了三角形不等式的另一种应用,这一次是在流算法的背景下。先前已使用三角形不等式来获得ED的亚缩写时间近似算法。我们的技术是新颖的,优雅地利用了三角形不平等,以牺牲运行时的指数增加为代价。

The edit distance (ED) and longest common subsequence (LCS) are two fundamental problems which quantify how similar two strings are to one another. In this paper, we consider these problems in the asymmetric streaming model introduced by Andoni et al. (FOCS'10) and Saks and Seshadhri (SODA'13). In this model we have random access to one string and streaming access the other string. Our main contribution is a constant factor approximation algorithm for ED with the memory of $\tilde O(n^δ)$ for any constant $δ> 0$. In addition to this, we present an upper bound of $\tilde O_ε(\sqrt{n})$ on the memory needed to approximate ED or LCS within a factor $1+ε$. All our algorithms are deterministic and run in a single pass. For approximating ED within a constant factor, we discover yet another application of triangle inequality, this time in the context of streaming algorithms. Triangle inequality has been previously used to obtain subquadratic time approximation algorithms for ED. Our technique is novel and elegantly utilizes triangle inequality to save memory at the expense of an exponential increase in the runtime.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源