Grammar ambiguity: why? (problem is: "(a)" vs "(a-z)")

Question

So I am trying to implement a pretty simple grammar for one-line statements:

# Grammar

   c         : Character c        [a-z0-9-]
  (v)        : Vowel              (= [a,e,u,i,o])
  (c)        : Consonant
  (?)        : Any character (incl. number)
  (l)        : Any alpha char     (= [a-z])
  (n)        : Any integer        (= [0-9])
  (c1-c2)    : Range from char c1 to char c2
  (c1,c2,c3) : List including chars c1, c2 and c3

  Examples:
  h(v)(c)no(l)(l)jj-k(n)
  h(v)(c)no(l)(l)(a)(a)(n)
  h(e-g)allo
  h(e,f,g)allo
  h(x,y,z)uul
  h(x,y,z)(x,y,z)(x,y,z)(x,y,z)uul

I am using the Happy parser generator (http://www.haskell.org/happy/) but for some reason there seems to be some ambiguity problem.

The error message is: "shift/reduce conflicts: 1"

I think the ambiguity is with these two lines:

  | lBracket char rBracket              { (\c -> case c of
                                                 'v' -> TVowel
                                                 'c' -> TConsonant
                                                 'l' -> TLetter
                                                 'n' -> TNumber) $2 }
  | lBracket char hyphen char rBracket  { TRange $2 $4              }

An example case is: "(a)" vs "(a-z)"

The lexer would give the following for the two cases:

(a)   : [CLBracket, CChar 'a', CRBracket]
(a-z) : [CLBracket, CChar 'a', CHyphen, CChar 'z', CRBracket]

What I don't understand is how this can be ambiguous with an LL[2] parser.

In case it helps here is the entire Happy grammar definition:

{

module XHappyParser where

import Data.Char
import Prelude   hiding (lex)
import XLexer
import XString

}

%name parse
%tokentype { Character  }
%error     { parseError }

%token
    lBracket                  { CLBracket   }
    rBracket                  { CRBracket   }
    hyphen                    { CHyphen     }
    question                  { CQuestion   }
    comma                     { CComma      }
    char                      { CChar $$    }

%%

xstring : tokens                            { XString (reverse $1)       }

tokens : token                              { [$1]                       }
       | tokens token                       { $2 : $1                    }

token : char                                { TLiteral $1                }
      | hyphen                              { TLiteral '-'               }
      | lBracket char rBracket              { (\c -> case c of
                                                     'v' -> TVowel
                                                     'c' -> TConsonant
                                                     'l' -> TLetter
                                                     'n' -> TNumber) $2 }
      | lBracket question rBracket          { TAny                      }
      | lBracket char hyphen char rBracket  { TRange $2 $4              }
      | lBracket listitems rBracket         { TList $2                  }

listitems : char                            { [$1]                      }
          | listitems comma char            { $1 ++ [$3]                }

{

parseError :: [Character] -> a
parseError _ = error "parse error"

}

Thank you!

hammar · Accepted Answer

Here's the ambiguity:

token : [...]
      | lBracket char rBracket
      | [...] 
      | lBracket listitems rBracket

listitems : char
          | [...]

Your parser could accept (v) as both TString [TVowel] and TString [TList ['v']], not to mention the missing characters in that case expression.

One possible way of solving it would be to modify your grammar so lists are at least two items, or have some different notation for vowels, consonants, etc.

Thomas M. DuBuisson · Answer

The problem seems to be:

| lBracket char rBracket
...
| lBracket listitems rBracket

or in cleaner syntax:

(c)

Can be a TVowel, TConsonant, TLetter, TNumber (as you know) or a singleton TList.

As the happy manual says, shift reduce usually isn't an issue. You can us precedence to force behavior/remove the warning if you'd like.

Grammar ambiguity: why? (problem is: "(a)" vs "(a-z)")

Tags:

parsing

haskell

grammar

happy

o1iver

2 Answers

hammar

Thomas M. DuBuisson

Recent Activity

Donate For Us

Grammar ambiguity: why? (problem is: "(a)" vs "(a-z)")

Tags:

parsing

haskell

grammar

happy

o1iver

2 Answers

hammar

Thomas M. DuBuisson

Related questions

Recent Activity

Donate For Us