I've seen some claims that optimized PEG parsers in general cannot be faster than optimized LALR(1) or LL(k) parsers. (Of course, performance of parsing would depend on a particular grammar.)
I'd like to know if there are any specific limitations of PEG parsers, either valid in general or for some subsets of PEG grammars that would make them inferior to LALR(1) or LL(k) performance-wise.
In particular, I'm interested in parser generators, but assume that their output can be tweaked for performance in any particular case. I also assume that parsers are optimized and it is possible to tweak a particular grammar a bit if that's needed to improve performance.
An LALR(1) parser is an "upgraded" version of an LR(0) parser that keeps track of more precise information to disambiguate the grammar. An LR(1) parser is a significantly more powerful parser that keeps track of even more precise information than an LALR(1) parser.
True, Canonical LR parser is more powerful than LALR parser. The number of states in the parsing automaton differs between the CLR and LALR algorithms. CLR parsers have a much larger number of states than LALR parsers. Statement C: LALR parser is more powerful than Canonical LR parser.
Explanation: Canonical LR is the most powerful parser as compared to other LR parsers.
CLR parsing use the canonical collection of LR (1) items to build the CLR (1) parsing table. CLR (1) parsing table produces the more number of states as compare to the SLR (1) parsing. In the CLR (1), we place the reduce node only in the lookahead symbols.
Found a good answer about Packrat vs LALR parsing. Some quotes from it:
L(AL)R parsers are linear time parsers, too. So in theory, neither packrat nor L(AL)R parsers are "faster".
What matters, in practice, of course, is implementation. L(AL)R state transitions can be executed in very few machine instructions ("look token code up in vector, get next state and action") so they can be extremely fast in practice.
An observation: most language front-ends don't spend most of their time "parsing"; rather, they spend a lot of time in lexical analysis. Optimize that ..., and the parser speed won't matter much.
PEG
parsers can use unlimited lookahead (while maintaining linear parse time on average, via packrat) unlike (default) LL(k)
, or LR(k)
parsers which use limited lookahead, while maintining linear parse time.
Lately (2014-2015) ANTLR4
has made extensions to handle arbitrary lookahead (as in PEG
) while maintaining linear parse time on average (said to be more efficient than packrat
algorithm), however this is incorporates new extensions and variations of the LR
parsing algorithm (and not the default LR
algorithm).
The packrat
parser (and associated parsers for LL
, LR
) is not necesarily practical, but provides theoretical bounds on parsing so comparison can be made.
But note that unlimited lookahead can be used to parse grammars/languages in linear time (e.g via packrat
or antlr
) which are not possible to parse via LL(k)
or LR(k)
even in non-linear time, So it is important to understand what is compared to what.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With