Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to resolve ambiguity in the definition of an LR(1) grammar?

Tags:

parsing

go

ocaml

I am writing a Golang compiler in OCaml, and argument lists are causing me a bit of a headache. In Go, you can group consecutive parameter names of the same type in the following way:

func f(a, b, c int)  ===  func f(a int, b int, c int)

You can also have a list of types, without parameter names:

func g(int, string, int)

The two styles cannot be mix-and-matched; either all parameters are named or none are.

My issue is that when the parser sees a comma, it doesn't know what to do. In the first example, is a the name of a type or the name of a variable with more variables coming up? The comma has a dual role and I am not sure how to fix this.

I am using the Menhir parser generator tool for OCaml.

Edit: at the moment, my Menhir grammar follows exactly the rules as specified at http://golang.org/ref/spec#Function_types

like image 464
gnuvince Avatar asked Sep 22 '14 14:09

gnuvince


People also ask

Can an ambiguous grammar be LR 1?

No ambiguous grammar is LL(1) or LR(1), so we must either rewrite the grammar to remove the ambiguity or resolve conflicts in the parser table or implementation.

How do you remove ambiguity in a sentence?

Expansion: Adding a word or two to the sentence can remove ambiguity. He finished the race last Thursday. ---> He finished the race on last Thursday.

Can we use ambiguous grammar in LR parsing?

LR parser can be used to parse ambiguous grammars. LR parser resolves the conflicts (shift/reduce or reduce/reduce) in parsing table of ambiguous grammars based on certain rules (precedence and/or associativity of operators) of the grammar.


1 Answers

As written, the go grammar is not LALR(1). In fact, it is not LR(k) for any k. It is, however, unambiguous, so you could successfully parse it with a GLR parser, if you can find one (I'm pretty sure that there are several GLR parser generators for OCAML, but I don't know enough about any of them to recommend one).

If you don't want to (or can't) use a GLR parser, you can do it the same way Russ Cox did in the gccgo compiler, which uses bison. (bison can generate GLR parsers, but Cox doesn't use that feature.) His technique does not rely on the scanner distinguishing between type-names and non-type-names.

Rather, it just accepts parameter lists whose elements are either name_or_type or name name_or_type (actually, there are more possibilities than that, because of the ... syntax, but it doesn't change the general principle.) That's simple, unambiguous and LALR(1), but it is overly-accepting -- it will accept func foo(a, b int, c), for example -- and it does not produce the correct abstract syntax tree because it doesn't attach the type to the list of parameters being declared.

What that means is that once the argument list is fully parsed and is about to be inserted into the AST attached to some function declaration (for example), a semantic scan is performed to fix it up and, if necessary, produce an error message. That scan is done right-to-left over the list of declaration elements, so that the specified type can be propagated to the left.

It's worth noting that the grammar in the reference manual is also overly-accepting, because it does not express the constraint that "either all parameters are named or none are". That constraint could be expressed in an LR(1) grammar -- I'll leave that as an exercise for readers -- but the resulting grammar would be a lot more difficult to understand.

like image 125
rici Avatar answered Oct 20 '22 21:10

rici