论文标题

PIKA解析:重新重新设计作为动态编程算法解决左侧递归和错误恢复问题的解析

Pika parsing: reformulating packrat parsing as a dynamic programming algorithm solves the left recursion and error recovery problems

论文作者

Hutchison, Luke A. D.

论文摘要

递归下降解析器是由一组相互重新恢复函数构建的,其中每个函数直接实现了语法的一个非末端。 Packrat解析器使用回忆来降低从指数中的递归下降解析的时间复杂性,从指数中到线性。递归下降解析器非常简单,但是遇到了两个重大问题:(i)左获取语法会导致解析器陷入无限的递归中,并且(ii)在句法错误后,最佳或不可能恢复解析状态并继续解析可能很困难或不可能。 Pika Parser解决了这两个问题,Pika Parser是一种动态编程算法的新型重新印象,它需要反向分析输入:自下而上,向右到左至右,而不是自上而下,而不是自上而下,从左上到右。这种反向解析顺序使Pika解析器能够处理使用直接或间接左递归的语法来实现左联想性,简化语法写作,还可以从语法错误中获得最佳的恢复,这是IDE和编译器的重要属性。 Pika解析将Packrat解析的线性时间性能特征与输入长度的函数保持在一起。 Pika Parser对广泛使用的Parboiled2和Antlr4解析库进行了基准测试。皮卡解析器的表达语法表达语法明显好于其他解析器,尽管对于实现Java语言规范的复杂语法,每个输入字符都会产生巨大的恒定性能影响。因此,如果性能很重要,则最好将PIKA解析应用于简单至中等大小的语法,或者对非常大的输入,如果其他解析替代方案不会在输入的长度上线性扩展。提出了一些关于优先事项,关联性和左递归的新见解。

A recursive descent parser is built from a set of mutually-recursive functions, where each function directly implements one of the nonterminals of a grammar. A packrat parser uses memoization to reduce the time complexity for recursive descent parsing from exponential to linear in the length of the input. Recursive descent parsers are extremely simple to write, but suffer from two significant problems: (i) left-recursive grammars cause the parser to get stuck in infinite recursion, and (ii) it can be difficult or impossible to optimally recover the parse state and continue parsing after a syntax error. Both problems are solved by the pika parser, a novel reformulation of packrat parsing as a dynamic programming algorithm, which requires parsing the input in reverse: bottom-up and right to left, rather than top-down and left to right. This reversed parsing order enables pika parsers to handle grammars that use either direct or indirect left recursion to achieve left associativity, simplifying grammar writing, and also enables optimal recovery from syntax errors, which is a crucial property for IDEs and compilers. Pika parsing maintains the linear-time performance characteristics of packrat parsing as a function of input length. The pika parser was benchmarked against the widely-used Parboiled2 and ANTLR4 parsing libraries. The pika parser performed significantly better than the other parsers for an expression grammar, although for a complex grammar implementing the Java language specification, a large constant performance impact was incurred per input character. Therefore, if performance is important, pika parsing is best applied to simple to moderate-sized grammars, or to very large inputs, if other parsing alternatives do not scale linearly in the length of the input. Several new insights into precedence, associativity, and left recursion are presented.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源