Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ANTLR AST rules fail with RewriteEmptyStreamException

Tags:

parsing

antlr

I have a simple grammar:

grammar sample;
options { output = AST; }
assignment
    : IDENT ':=' expr ';'
    ;
expr    
    : factor ('*' factor)*
    ;
factor
    : primary ('+' primary)*
    ;
primary
    : NUM
    | '(' expr ')'
    ;
IDENT : ('a'..'z')+ ;
NUM   : ('0'..'9')+ ;
WS    : (' '|'\n'|'\t'|'\r')+ {$channel=HIDDEN;} ;

Now I want to add some rewrite rules to generate an AST. From what I've read online and in the Language Patterns book, I should be able to modify the grammar like this:

assignment
    : IDENT ':=' expr ';'   -> ^(':=' IDENT expr)
    ;
expr    
    : factor ('*' factor)* -> ^('*' factor+)
    ;
factor  
    : primary ('+' primary)* -> ^('+' primary+)
    ;
primary
    : NUM
    | '(' expr ')' -> ^(expr)
    ;

But it does not work. Although it compiles fine, when I run the parser I get a RewriteEmptyStreamException error. Here's where things get weird.

If I define the pseudo tokens ADD and MULT and use them instead of the tree node literals, it works without error.

tokens { ADD; MULT; }

expr    
    : factor ('*' factor)* -> ^(MULT factor+)
    ;
factor  
    : primary ('+' primary)* -> ^(ADD primary+)
    ;

Alternatively, if I use the node suffix notation, it also appears to work fine:

expr    
    : factor ('*'^ factor)*
    ;
factor  
    : primary ('+'^ primary)*
    ;

Is this discrepancy in behavior a bug?

like image 699
Barry Brown Avatar asked Apr 26 '10 00:04

Barry Brown


2 Answers

No, not a bug, AFAIK. Take your expr rule for example:

expr    
    : factor ('*' factor)* -> ^('*' factor+)
    ;

since the * might not be present, it should also not be in your AST rewrite rule. So, the above is incorrect and ANTLR complaining about it is correct.

Now if you insert an imaginary token like MULT instead:

expr    
    : factor ('*' factor)* -> ^(MULT factor+)
    ;

all is okay since your rule will always produce one or more factor's.

What you probably meant to do is something like this:

expr    
    :  (factor -> factor) ('*' f=factor -> ^('*' $expr $f))*
    ;

Also see chapter 7: Tree Construction from The Definitive ANTLR Reference. Especially the paragraphs Rewrite Rules in Subrules (page 173) and Referencing Previous Rule ASTs in Rewrite Rules (page 174/175).

like image 106
Bart Kiers Avatar answered Nov 02 '22 02:11

Bart Kiers


If you want to generate an N-ary tree for the '*' operator with all children at the same level you can do this:

expr
    : (s=factor -> factor) (('*' factor)+ -> ^('*' $s factor+))?
    ;

Here are some examples of what this will return:

Tokens: AST
factor: factor
factor '*' factor: ^('*' factor factor)
factor '*' factor '*' factor: ^('*' factor factor factor)

Bart's third example above will produce a nested tree, since the result of $expr for each successive iteration is a node with two children, like this:

factor * factor * factor: ^('*' factor ^('*' factor factor))

which you probably don't need since multiplication is commutative.

like image 32
JoelPM Avatar answered Nov 02 '22 03:11

JoelPM