Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ANTLR4 visitor pattern on simple arithmetic example

I am a complete ANTLR4 newbie, so please forgive my ignorance. I ran into this presentation where a very simple arithmetic expression grammar is defined. It looks like:

grammar Expressions;  start : expr ;  expr  : left=expr op=('*'|'/') right=expr #opExpr       | left=expr op=('+'|'-') right=expr #opExpr       | atom=INT #atomExpr       ;  INT   : ('0'..'9')+ ;  WS    : [ \t\r\n]+ -> skip ; 

Which is great because it will generate a very simple binary tree that can be traversed using the visitor pattern as explained in the slides, e.g., here's the function that visits the expr:

public Integer visitOpExpr(OpExprContext ctx) {   int left = visit(ctx.left);   int right = visit(ctx.right);   String op = ctx.op.getText();   switch (op.charAt(0)) {     case '*': return left * right;     case '/': return left / right;     case '+': return left + right;     case '-': return left - right;     default: throw new IllegalArgumentException("Unkown opeator " + op);   } } 

The next thing I would like to add is support for parentheses. So I modified the expr as follows:

expr  : '(' expr ')'                      #opExpr       | left=expr op=('*'|'/') right=expr #opExpr       | left=expr op=('+'|'-') right=expr #opExpr       | atom=INT #atomExpr       ; 

Unfortunately, the code above fails because when encountering parentheses the three attributes op,left and right are null (fails with NPE).

I think I could work around that by defining a new attribute, e.g., parenthesized='(' expr ')', and then deal with that in the visitor code. However, it seems overkill to me to have a whole extra node type to represent an expression in parentheses. A simpler but uglier solution is to add the following line of code at the beginning of the visitOpExpr method:

if (ctx.op == null) return visit(ctx.getChild(1)); // 0 and 2 are the parentheses! 

I don't like the above at all because it's very fragile and highly dependent on the grammar structure.

I am wondering if there is a way to tell ANTLR to just "eat" the parentheses and treat the expression like a child. Is there? Is there a better way to do this?

Note: My end goal is to extend the example to include boolean expressions that can themselves contain arithmetic expressions, e.g., (2+4*3)/10 >= 11, that is, a relation (<,>,==,~=,etc.) between arithmetic expressions can define an atomic boolean expression. This is straight forward and I already have the grammar sketched out but I have the same problem with parenthesis, i.e., I need to be able to write stuff like (I will also add support for variables):

((2+4*x)/10 >= 11) | ( x>1 & x<3 ) 

EDIT: Fixed the precedence of the parenthesized expression, parenthesis always have higher precedence.

like image 782
Giovanni Botta Avatar asked Apr 15 '14 18:04

Giovanni Botta


People also ask

How does Antlr4 work?

The Antlr4 application generates a lexer from a set of lexer rules, as we'll see in our expression grammar momentarily. We specify rules; Antlr4 generates a lexer from them. Rules resemble assignment statements where names begin with uppercase alpha.

What is a visitor in Antlr?

ANTLR Visitors The difference between listener and visitor mechanisms is listener methods are called by the ANTLR-provided walker object, whereas visitor methods must walk their children with explicit visit calls. Forgetting to invoke visit() on a node's children means those subtrees don't get visited.

What is AST in Antlr?

ANTLR helps you build intermediate form trees, or abstract syntax trees (ASTs), by providing grammar annotations that indicate what tokens are to be treated as subtree roots, which are to be leaves, and which are to be ignored with respect to tree construction.

How do you use Antlr in Python?

What you need to do to get a parse tree: define a lexer and parser grammar. invoke ANTLR: it will generate a lexer and a parser in your target language (e.g., Java, Python, C#, JavaScript) use the generated lexer and parser: you invoke them passing the code to recognize and they return to you a parse tree.


1 Answers

Sure, just label it differently. After all, the alternative '(' expr ')' isn't a #opExpr:

expr  : left=expr op=('*'|'/') right=expr #opExpr       | left=expr op=('+'|'-') right=expr #opExpr       | '(' expr ')'                      #parenExpr       | atom=INT                          #atomExpr       ; 

And in your visitor, you'd do something like this:

public class EvalVisitor extends ExpressionsBaseVisitor<Integer> {      @Override     public Integer visitOpExpr(@NotNull ExpressionsParser.OpExprContext ctx) {         int left = visit(ctx.left);         int right = visit(ctx.right);         String op = ctx.op.getText();         switch (op.charAt(0)) {             case '*': return left * right;             case '/': return left / right;             case '+': return left + right;             case '-': return left - right;             default: throw new IllegalArgumentException("Unknown operator " + op);         }     }      @Override     public Integer visitStart(@NotNull ExpressionsParser.StartContext ctx) {         return this.visit(ctx.expr());     }      @Override     public Integer visitAtomExpr(@NotNull ExpressionsParser.AtomExprContext ctx) {         return Integer.valueOf(ctx.getText());     }      @Override     public Integer visitParenExpr(@NotNull ExpressionsParser.ParenExprContext ctx) {         return this.visit(ctx.expr());     }      public static void main(String[] args) {         String expression = "2 * (3 + 4)";         ExpressionsLexer lexer = new ExpressionsLexer(CharStreams.fromString(expression));         ExpressionsParser parser = new ExpressionsParser(new CommonTokenStream(lexer));         ParseTree tree = parser.start();         Integer answer = new EvalVisitor().visit(tree);         System.out.printf("%s = %s\n", expression, answer);     } } 

If you run the class above, you'd see the following output:

2 * (3 + 4) = 14
like image 113
Bart Kiers Avatar answered Oct 13 '22 20:10

Bart Kiers