Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ambiguity resolution when making a C++ parser

Tags:

c++

parsing

lalr

I wrote a LALR(1) parser for C++17. I found 156 ambiguities, some of them I can resolve it according to standard but the others I can't.

For example: Shift-Reduce conflict occurs while parsing "operator+ < ......" when a less-than is encountered:

We may parse it as:

(1)

template-id -> operator-function-id · < ...... >

or:

(2)

unqualified-id -> operator-function-id · where (1) need shifting but (2) need reducing.

However, the standard have:

After name lookup (3.4) finds that a name is a template-name or that an operator-function-id or a literaloperator-id refers to a set of overloaded functions any member of which is a function template, if this is followed by a <, the < is always taken as the delimiter of a template-argument-list and never as the less-than operator. When parsing a template-argument-list, the first non-nested >137 is taken as the ending delimiter rather than a greater-than operator.

So we choose to shift.

Unfortunately, there are many ambiguities I can't find the resolution. Here I list some of them(Some of them can clearly make choice but I just can't find the proof):

  1. Are there some portion in standard that indicates "shifting" is the default choice when ambiguity occurs?

Declarator

(1)when a noptr-declarator is parsed and a left-paren is encountered, I should reduce it according to:

ptr-declarator -> noptr-declarator ·

or shift the left-paren to satisfy:

declarator -> noptr-declarator · parameters-and-qualifiers

parameters-and-qualifiers -> · left-paren parameter-declaration-clause right-paren......

(2)when a declarator-id is parsed and a left-bracket is encountered, I should reduce it according to:

noptr-declarator -> declarator-id · noptr-declarator -> noptr-declarator · \left-bracket ?constant-expression \right-bracket ?attribute-specifier-seq

or shift the left-square to satisfy:

noptr-declarator -> declarator-id ·attribute-specifier-seq

(attribute-specifier-seq is [[.......]])

like image 658
ChungkingExpress Avatar asked Feb 24 '16 09:02

ChungkingExpress


1 Answers

Following up on TonyD's comment: See Why can't C++ be parsed with a LR(1) parser?

In some places, you essentially have to keep the ambiguity produced by parsing, and resolve it by doing name resolution, or equivalently, you have to tangle name resolution into the parsing process. In either case, you have to interpret the standard to determine how the ambiguities should be resolved, and yes, that's a very difficult task.

Then you get to find out what the compilers really do; both GCC and MS have lots of extensions and variations from the standard, both in terms of syntax and semantic interpretations (these produce programs that produce different results under different compilers). Lastly, you get to find what abominations are in the system header files; these are hacks that added by the compiler folks to make their lives convenient, and are very badly documented, if at all.

like image 106
Ira Baxter Avatar answered Sep 20 '22 06:09

Ira Baxter