I have a simple grammar:
grammar sample;
options { output = AST; }
assignment
: IDENT ':=' expr ';'
;
expr
: factor ('*' factor)*
;
factor
: primary ('+' primary)*
;
primary
: NUM
| '(' expr ')'
;
IDENT : ('a'..'z')+ ;
NUM : ('0'..'9')+ ;
WS : (' '|'\n'|'\t'|'\r')+ {$channel=HIDDEN;} ;
Now I want to add some rewrite rules to generate an AST. From what I've read online and in the Language Patterns book, I should be able to modify the grammar like this:
assignment
: IDENT ':=' expr ';' -> ^(':=' IDENT expr)
;
expr
: factor ('*' factor)* -> ^('*' factor+)
;
factor
: primary ('+' primary)* -> ^('+' primary+)
;
primary
: NUM
| '(' expr ')' -> ^(expr)
;
But it does not work. Although it compiles fine, when I run the parser I get a RewriteEmptyStreamException error. Here's where things get weird.
If I define the pseudo tokens ADD and MULT and use them instead of the tree node literals, it works without error.
tokens { ADD; MULT; }
expr
: factor ('*' factor)* -> ^(MULT factor+)
;
factor
: primary ('+' primary)* -> ^(ADD primary+)
;
Alternatively, if I use the node suffix notation, it also appears to work fine:
expr
: factor ('*'^ factor)*
;
factor
: primary ('+'^ primary)*
;
Is this discrepancy in behavior a bug?
No, not a bug, AFAIK. Take your expr
rule for example:
expr
: factor ('*' factor)* -> ^('*' factor+)
;
since the *
might not be present, it should also not be in your AST rewrite rule. So, the above is incorrect and ANTLR complaining about it is correct.
Now if you insert an imaginary token like MULT
instead:
expr
: factor ('*' factor)* -> ^(MULT factor+)
;
all is okay since your rule will always produce one or more factor
's.
What you probably meant to do is something like this:
expr
: (factor -> factor) ('*' f=factor -> ^('*' $expr $f))*
;
Also see chapter 7: Tree Construction from The Definitive ANTLR Reference. Especially the paragraphs Rewrite Rules in Subrules (page 173) and Referencing Previous Rule ASTs in Rewrite Rules (page 174/175).
If you want to generate an N-ary tree for the '*' operator with all children at the same level you can do this:
expr
: (s=factor -> factor) (('*' factor)+ -> ^('*' $s factor+))?
;
Here are some examples of what this will return:
Tokens: AST
factor: factor
factor '*' factor: ^('*' factor factor)
factor '*' factor '*' factor: ^('*' factor factor factor)
Bart's third example above will produce a nested tree, since the result of $expr for each successive iteration is a node with two children, like this:
factor * factor * factor: ^('*' factor ^('*' factor factor))
which you probably don't need since multiplication is commutative.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With