What advantages do LL parsers have over LR parsers to warrant their relative popularity in today's parser generator tools?
According to Wikipedia, LR parsing appears to have advantages over LL:
LR parsing can handle a larger range of languages than LL parsing, and is also better at error reporting, i.e. it detects syntactic errors when the input does not conform to the grammar as soon as possible. This is in contrast to an LL(k) (or even worse, an LL(*) parser) which may defer error detection to a different branch of the grammar due to backtracking, often making errors harder to localize across disjunctions with long common prefixes.
Note: This is not homework. I was just surprised when I found out that Antlr is an LL parser generator (despite having "LR" in its name!).
LR(k) is stronger than LL(k) because it provisionally matches multiple productions at the same time---each parser state encodes a set of partially-recognized productions. The parser is not required to choose between them until the right edge of the winning production.
At a high level, the difference between LL parsing and LR parsing is that LL parsers begin at the start symbol and try to apply productions to arrive at the target string, whereas LR parsers begin at the target string and try to arrive back at the start symbol. An LL parse is a left-to-right, leftmost derivation.
LR parsers can parse a strictly larger class of grammars than (top-down) predictive parsers. LR parsers can usually recognize all programming language construct that can be specified by context-free grammars. LR parsers detect errors fast. Drawback: it is too much work to construct an LR parser by hand.
GLR is great if you want a parse tree/forest and don't mind black boxes. It lets you type in whatever CFG you want at the cost of checking for ambiguities at parse time via exhaustive testing, instead of resolving LR/LALR conflicts statically. Some say that's a good trade-off. Ira Baxter's DMS tool or Elkhound, which has a free C++ grammar, are useful for this class of problem. ANTLR is useful for a large class of language applications too, but uses a top-down approach, generating recursive descent parsers called LL(*) that allow semantic predicates. I will state without proof here that predicates allow you to parse context-sensitive languages beyond CFGs. Programmers like to insert actions into grammars, like good error handling, and like to single-step debug. LL is good at all three. LL is what we do by hand so it's easier to understand. Don't believe the wikipedia nonsense about LR being better at handling errors. That said, if you backtrack a lot with ANTLR, errors are indeed worse with LL(*) (PEGs have this problem).
Re backtracking. GLR speculates (i.e. backtracks) too, just like PEGs, ANTLR, and any other non-deterministic strategy. At any non-deterministic LR state, GLR "forks" sub-parsers to try out any viable path. Anyway, LL has good context for error handling. Where LR knows it's matching an expression, LL knows it's an expression in an assignment or IF
-conditional; LR knows it could be in either but isn't sure - and that uncertainty is where it gets its power.
GLR is O(n^3)
worst case. packrat/PEG is O(n)
worst case. ANTLR's are O(n^2)
due to cyclic lookahead DFA but O(n)
in practice. Doesn't matter really. GLR is fast enough.
ANTLR is ANother Tool for Lang Recognition not anti-LR, but I like that one too ;)
Frankly, like a lot of young coders in 80s, I didn't understand LALR and didn't like black boxes (now I dig the beauty of the GLR engine but still prefer LL). I built a commercial LL(k) based compiler and decided to build a tool to generate what I had built by hand. ANTLR isn't for everyone and edge cases like C++ might be better handled with GLR but a lot of people find ANTLR fits into their comfort zone. Since Jan 2008, there have been 134,000 downloads of ANTLR's binary jar, within ANTLRWorks, and source zips total (according to Google Analytics). See our paper on LL(*) with lots of empirical data.
If you have to hand code one, recursive descent (LL) is something you can do realistically; people cannot hand-build L(AL)R parsers practically by hand.
Given that modern parser generators will handle all the parser construction for you, and that space is not much of an issue, I prefer LR parsers because you don't have to fight with the grammars as much to make them valid for your particular parser generator (no "remove all the left recursion" silliness).
In fact, I prefer GLR parsers, which will pretty much parse anything with a context free grammar. No left-recursion worries. No shift/reduce conflict worries. No lookahead limits.
If you want to see the range of languages that one GLR parsing engine can handle (including the famously hard-to-parse-using-LL/LALR language, C++), you can look here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With