Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing Left Recursion in ANTLR

As is explained in Removing left recursion , there are two ways to remove the left recursion.

  • Modify the original grammar to remove the left recursion using some procedure
  • Write the grammar originally not to have the left recursion

What people normally use for removing (not having) the left recursion with ANTLR? I've used flex/bison for parser, but I need to use ANTLR. The only thing I'm concerned about using ANTLR (or LL parser in genearal) is left recursion removal.

  • In practical sense, how serious of removing left recursion in ANTLR? Is this a showstopper in using ANTLR? Or, nobody cares about it in ANTLR community?
  • I like the idea of AST generation of ANTLR. In terms of getting AST quick and easy way, which method (out of the 2 removing left recursion methods) is preferable?

Added

I did some experiment with the following grammar.

E -> E + T|T
T -> T * F|F
F -> INT | ( E )

After left recursion removal, I get the following one

E -> TE'
E' -> null | + TE'
T -> FT'
T' -> null | * FT'

I could come up with the following ANTLR representation. Even though, It's relatively pretty simple and straightforward, it seems the grammar that doesn't have the left recursion should be the better way to go.

grammar T;

options {
    language=Python;
}

start returns [value]
   : e {$value = $e.value};
e returns [value]
   : t ep  
     {
       $value = $t.value
       if $ep.value != None:
         $value += $ep.value
     }
   ;
ep returns [value]
   : {$value = None}
   | '+' t r = ep 
     {
       $value = $t.value
       if $r.value != None:
            $value += $r.value
     }
   ;
t returns [value]
  : f tp 
    {
      $value = $f.value
      if $tp.value != None:
        $value *= $tp.value
    }
  ;
tp returns [value]
  : {$value = None}
  | '*' f r = tp 
    {
      $value = $f.value;
      if $r.value != None:
        $value *= $r.value
    }
  ;
f returns [int value]
  : INT {$value = int($INT.text)}
  | '(' e ')' {$value = $e.value}
  ;

INT :   '0'..'9'+ ;
WS: (' '|'\n'|'\r')+ {$channel=HIDDEN;} ;
like image 750
prosseek Avatar asked Jun 08 '10 17:06

prosseek


1 Answers

This is only orthogonally relevant, but I just published a preprint of a paper on a new parsing method that I call "pika parsing" (c.f. packrat parsing) that directly handles left recursive grammars without the need for rule rewriting.

https://arxiv.org/abs/2005.06444

like image 84
Luke Hutchison Avatar answered Sep 17 '22 11:09

Luke Hutchison