Practical consequences of formal grammar power?

Tags:

Every undergraduate Intro to Compilers course reviews the commonly-implemented subsets of context-free grammars: LL(k), SLR(k), LALR(k), LR(k). We are also taught that for any given k, each of those grammars is a subset of the next.

What I've never seen is an explanation of what sorts of programming language syntactic features might require moving to a different language class. There's an obvious practical motivation for GLR parsers, namely, avoiding an unholy commingling of parser and symbol table when parsing C++. But what about the differences between the two "standard" classes, LL and LR?

Two questions:

What (general) syntactic constructions can be parsed with LR(k) but not LL(k')?
In what ways, if any, do those constructions manifest as desirable language constructs?

There's a plausible argument for reducing language power by making k as small as possible, because a language requiring many, many tokens of lookahead will be harder for humans to parse, as well as "harder" for machines to parse. Question (2) implicitly asks if the same reasoning ends up holding between classes, as well as within a class.

edit: Here's one example to illustrate the sorts of answers I'm looking for, but for regular languages instead of context-free:

When describing a regular language, one usually gets three operators: +, *, and ?. Now, you can remove + without reducing the power of the language; instead of writing x+, you write xx*, and the effect is the same. But if x is some big and hairy expression, the two xs are likely to diverge over time due to human forgetfulness, yielding a syntactically correct regular expression that doesn't match the original author's intent. Thus, even though adding + doesn't strictly add power, it does make the notation less error-prone.

Are there constructs with similar practical (human?) effects that must be "removed" when switching from LR to LL?

752

asked Dec 16 '09 01:12

Ben Karel

1 Answers

Parsing (I claim) is a bit like sorting: a problem that was the focus of a lot of thought in the early days of CS, leading to a set of well-understood solutions with some nice theoretical results.

My claim is that the picture that we get (or give, for those of us who teach) in a compilers class is, to some degree, a beautiful answer to the wrong question.

To answer your question more directly, an LL(1) grammar can't parse all kinds of things that you might want to parse; the "natural" formulation of an 'if' with an optional 'else', for instance.

But wait! Can't I reformulate my grammar as an LL(1) grammar and then patch up the source tree by walking over it afterward? Sure you can! To some degree, this is what makes the question of what kind of grammar your parser uses largely moot.

Also, back when I was an undergraduate (1990-94), whitespace-sensitive grammars were clearly the work of the Devil; now, Python and Haskell's designs are bringing whitespace-sensitivity back into the light. Also, Packrat parsing says "to heck with your theoretical purity: I'm just going to define a parser as a set of rules, and I don't care what class my grammar belongs to." (paraphrased)

In summary, I would agree with what I believe to be your implied suggestion: in 2009, a clear understanding of the difference between the classes LL(k) and LR(k) is less important in itself than the ability to formulate and debug a grammar that makes your parser generator happy.

126

answered Sep 28 '22 19:09

John Clements

Related questions
                            
                                Writing/parsing text file with fixed width lines
                            
                                C3.js - Timeseries with time fails to parse
                            
                                Determine if a String is a valid date before parsing
                            
                                Libs for HTML sanitizing
                            
                                python 2 and 3 extract domain from url
                            
                                Evaluating mathematical expressions
                            
                                Reading the fileset from a torrent
                            
                                Python regex, matching pattern over multiple lines.. why isn't this working?
                            
                                _Actual_ Twitter format for hashtags? Not your regex, not his code-- the actual one?
                            
                                IP address validation
                            
                                How to check if a string is a natural number?
                            
                                Parse HEX ASCII into numbers?
                            
                                How can I parse multiple (unknown) date formats in python?
                            
                                How to parse a boolean expression and load it into a class?
                            
                                Equation Parsing Library C++ [closed]
                            
                                Syntax ambiguities of C++
                            
                                Is there an established way to write parsers that can reconstruct their exact input?
                            
                                Parsing WiFi Packets (libpcap)
                            
                                Perl6 : What is the best way for dealing with very big files?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Practical consequences of formal grammar power?

Tags:

parsing

theory

Ben Karel

People also ask

1 Answers

John Clements

Recent Activity

Donate For Us