论文标题
关于使用Mealy机器实现完整常规表达匹配的报告
A Report on Achieving Complete Regular-Expression Matching using Mealy Machines
论文作者
论文摘要
虽然RegeXP匹配是在数据流中查找模式的强大机制,但RegexP发动机通常只找到不重叠的匹配项。此外,经常使用读取的符号不止一次处理的不同形式的非确定探索,在实时匹配中,读取的符号经常被多次处理。我们提出了一种算法,该算法是从任何REGEXP构造的一台Mealy机器,该机器可以找到所有匹配,并且在仅阅读每个输入符号时一次。计算的机器还可以检测和区分模式内部模式或子图案。此外,我们通过通过普通语言正式化Mealy机器来展示如何通过DFA最小化的变化来计算最小的Mealy机器。
While regexp matching is a powerful mechanism for finding patterns in data streams, regexp engines in general only find matches that do not overlap. Moreover, different forms of nondeterministic exploration, where symbols read are processed more than once, are often used, which can be costly in real-time matching. We present an algorithm that constructs from any regexp a Mealy machine that finds all matches and while reading each input symbol only once. The machine computed can also detect and distinguish different patterns or sub-patterns inside patterns. Additionally, we show how to compute a minimal Mealy machine via a variation of DFA minimization, by formalizing Mealy machines in terms of regular languages.