Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing left-recursive grammar in sum-type inifinite recursion

I'm trying to write a parser for the Tiger language from modern compiler implementation in ML, and am stuck on one of the recursive types.

I have the following type

data LValue =                                                                                                                       
    Id Atom                                                                                                                         
    | RecordAccess LValue Atom                                                                                                      
    | ArraySubscript LValue Expression  

with the following grammar:

lvalue -> id
       -> lvalue.id
       -> lvalue[exp]
id -> atom
exp -> (the big, overarching, everything-is-an-expression type)

And I'm trying to parse it with Parsec, but I'm getting stuck in an infinitely recursive loop. Here's my current base parser:

lvalueParser :: Parsec String () LValue                                                                                             
lvalueParser =                                                                                                                      
    try (Id <$> (atomParser <* (notFollowedBy (char '.'))))                                                                         
    <|> try recordAccessParser                                                                                                      
    where recordAccessParser = (uncurry RecordAccess) <$> do {                                                                      
      record <- lvalueParser;                                                                                                         
      char '.';                                                                                                                     
      atom <- atomParser;                                                                                                           
      return (record, atom)                                                                                                      
      }

(Note: I haven't yet attempted to implement anything to handle the ArrayAccess portion)

Clearly, the infinite-loopiness happens when the recordAccessParser calls back to the lvalueParser.

I can change the recordAccessParser thusly:

recordAccessParser = (uncurry RecordAccess) <$> do {                                                                      
          record <- atomParser;                                                                                                         
          char '.';                                                                                                                     
          atom <- atomParser;                                                                                                           
          return (Id record, atom)                                                                                                      
          }

and it terminates. However, it will not parse record access more than a single level deep:

Parsec.parse lvalueParser "" "record_1.field_1.field_2"
#=> RecordAccess (Id record_1) (Id field_1)

And I expect

#=> RecordAccess (RecordAccess (Id record_1) (Id field_1)) (Id field_2)

I looked at chainl1, but the type of the chaining parser is a -> a -> a, and that doesn't match the type of LValue that reflects the grammar. I also looked at many; however I don't have a constant prefix for each term - the left recursion term is what I'm trying to parse into part of the result type.

I imagine I'm missing a specific concept of Parsec/parsing and would love to be pointed in the right direction. There are more types in the language that I'm writing a parser for that will have similar constructions.

like image 991
Stuart Avatar asked Oct 30 '25 03:10

Stuart


1 Answers

The usual way to parse left-recursive grammars with tools that don't support left recursion, is indeed to replace the left recursion with repetition (i.e. many). For record access that'd mean replacing a rule like

lvalue ::= lvalue '.' ID
         | primaryLValue

with

lvalue ::= primaryLValue ('.' ID)*

In terms of Parsec that would mean:

record <- atomParser                                                                                                       
fields <- many (char '.' >> idParser)

Now you have an LValue and a list of 0 or more field names, which doesn't fit your AST type, but you can solve that by folding the RecordAccess constructor over the list.

like image 167
sepp2k Avatar answered Nov 02 '25 22:11

sepp2k



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!