Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance of parsers: PEG vs LALR(1) or LL(k)

I've seen some claims that optimized PEG parsers in general cannot be faster than optimized LALR(1) or LL(k) parsers. (Of course, performance of parsing would depend on a particular grammar.)

I'd like to know if there are any specific limitations of PEG parsers, either valid in general or for some subsets of PEG grammars that would make them inferior to LALR(1) or LL(k) performance-wise.

In particular, I'm interested in parser generators, but assume that their output can be tweaked for performance in any particular case. I also assume that parsers are optimized and it is possible to tweak a particular grammar a bit if that's needed to improve performance.

like image 707
Roman Boiko Avatar asked Jul 07 '12 08:07

Roman Boiko


People also ask

What is the difference between LALR 1 and LR 1 parsing?

An LALR(1) parser is an "upgraded" version of an LR(0) parser that keeps track of more precise information to disambiguate the grammar. An LR(1) parser is a significantly more powerful parser that keeps track of even more precise information than an LALR(1) parser.

Which is the most powerful LR parser and why?

True, Canonical LR parser is more powerful than LALR parser. The number of states in the parsing automaton differs between the CLR and LALR algorithms. CLR parsers have a much larger number of states than LALR parsers. Statement C: LALR parser is more powerful than Canonical LR parser.

Which is the most powerful LR parser?

Explanation: Canonical LR is the most powerful parser as compared to other LR parsers.

Why are CLR 1 parsers more powerful than SLR 1 parsers explain?

CLR parsing use the canonical collection of LR (1) items to build the CLR (1) parsing table. CLR (1) parsing table produces the more number of states as compare to the SLR (1) parsing. In the CLR (1), we place the reduce node only in the lookahead symbols.


2 Answers

Found a good answer about Packrat vs LALR parsing. Some quotes from it:

L(AL)R parsers are linear time parsers, too. So in theory, neither packrat nor L(AL)R parsers are "faster".

What matters, in practice, of course, is implementation. L(AL)R state transitions can be executed in very few machine instructions ("look token code up in vector, get next state and action") so they can be extremely fast in practice.

An observation: most language front-ends don't spend most of their time "parsing"; rather, they spend a lot of time in lexical analysis. Optimize that ..., and the parser speed won't matter much.

like image 116
Roman Boiko Avatar answered Sep 19 '22 05:09

Roman Boiko


PEG parsers can use unlimited lookahead (while maintaining linear parse time on average, via packrat) unlike (default) LL(k), or LR(k) parsers which use limited lookahead, while maintining linear parse time.

Lately (2014-2015) ANTLR4 has made extensions to handle arbitrary lookahead (as in PEG) while maintaining linear parse time on average (said to be more efficient than packrat algorithm), however this is incorporates new extensions and variations of the LR parsing algorithm (and not the default LR algorithm).

The packrat parser (and associated parsers for LL, LR) is not necesarily practical, but provides theoretical bounds on parsing so comparison can be made.

But note that unlimited lookahead can be used to parse grammars/languages in linear time (e.g via packrat or antlr) which are not possible to parse via LL(k) or LR(k) even in non-linear time, So it is important to understand what is compared to what.

like image 21
Nikos M. Avatar answered Sep 20 '22 05:09

Nikos M.