I started using ANTLR today and I've created a basic parser.
After parsing I end up with a tree. To me it seems like this is just a bunch of String
s put together in a tree structure of Tree
-nodes. That's not very useful to me. I'd like to have a graph of objects.
To clarify (this is an example, and not my real application): For "5-1+6"
I seem to end up with:
new String("PLUS")
new String("MINUS")
new String("5")
new String("1")
new String("6")
What I would find more useful:
new Plus(
new Minus(
new IntegerLiteral(5),
new IntegerLiteral(1)),
new IntegerLiteral(6))
What is the most convenient way of going from the first representation to the other? In this article the author does something similar to this:
public Expression createExpr(CommonTree ast) {
// ...
switch (ast.getType()) {
case SimpleExpressionParser.INT:
return new IntegerLiteral(ast.getText())
case SimpleExpressionParser.PLUS:
return new Plus(createExpr((CommonTree)ast.getChild(0)), // recurse
createExpr((CommonTree)ast.getChild(1))); // recurse
case SimpleExpressionParser.MINUS:
return new Minus(createExpr((CommonTree)ast.getChild(0)), // recurse
createExpr((CommonTree)ast.getChild(1))); // recurse
}
// ...
}
Is this the preferred way?! Can't I instruct ANTLR to generate this boiler-plate code somehow (it will be huge)?
Possibly related questions:
Here's a a possible way. In short, these are the step you'd perform:
Let's create a small expression parser supporting +
, -
, *
, /
, (...)
and numbers, which could look like:
grammar Exp; // file: Exp.g
eval
: exp EOF
;
exp
: addExp
;
addExp
: mulExp ((Add | Sub) mulExp)*
;
mulExp
: unaryExp ((Mul | Div) unaryExp)*
;
unaryExp
: Sub atom
| atom
;
atom
: Number
| '(' exp ')'
;
Add : '+';
Sub : '-';
Mul : '*';
Div : '/';
Number : '0'..'9'+;
Space : ' ' {skip();};
Including rewrite rules, it will look like:
grammar Exp; // file: Exp.g
options {
output=AST;
}
tokens {
U_SUB;
}
eval
: exp EOF -> exp
;
exp
: addExp
;
addExp
: mulExp ((Add | Sub)^ mulExp)*
;
mulExp
: unaryExp ((Mul | Div)^ unaryExp)*
;
unaryExp
: Sub atom -> ^(U_SUB atom)
| atom
;
atom
: Number
| '(' exp ')' -> exp
;
Add : '+';
Sub : '-';
Mul : '*';
Div : '/';
Number : '0'..'9'+;
Space : ' ' {skip();};
Now an expression like 10 - 2 * (3 + 8)
will be transformed to:
To create a tree grammar that generates an iterator for the AST generated in (2), you'd do something like this:
tree grammar ExpWalker; // file: ExpWalker.g
options {
tokenVocab=Exp; // use the tokens from Exp.g
ASTLabelType=CommonTree;
}
eval
: exp
;
exp
: ^(Add exp exp)
| ^(Sub exp exp)
| ^(Mul exp exp)
| ^(Div exp exp)
| ^(U_SUB exp)
| Number
;
And to mix your custom classes in this tree iterator, do something like this:
tree grammar ExpWalker; // file: ExpWalker.g
options {
tokenVocab=Exp; // use the tokens from Exp.g
ASTLabelType=CommonTree;
}
eval returns [ExpNode e]
: exp {e = $exp.e;}
;
exp returns [ExpNode e]
: ^(Add a=exp b=exp) {e = new AddExp($a.e, $b.e);}
| ^(Sub a=exp b=exp) {e = new SubExp($a.e, $b.e);}
| ^(Mul a=exp b=exp) {e = new MulExp($a.e, $b.e);}
| ^(Div a=exp b=exp) {e = new DivExp($a.e, $b.e);}
| ^(U_SUB a=exp) {e = new UnaryExp($a.e);}
| Number {e = new NumberExp($Number.text);}
;
Here's some code to test all classes (stick it all just in one file: Main.java
):
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;
public class Main {
public static void main(String[] args) throws Exception {
String source = "10 - 2 * (3 + 8)";
ExpLexer lexer = new ExpLexer(new ANTLRStringStream(source));
CommonTokenStream tokens = new CommonTokenStream(lexer);
ExpParser parser = new ExpParser(tokens);
ExpParser.eval_return returnValue = parser.eval();
CommonTree tree = (CommonTree)returnValue.getTree();
CommonTreeNodeStream nodes = new CommonTreeNodeStream(tree);
ExpWalker walker = new ExpWalker(nodes);
ExpNode root = walker.eval();
System.out.println(source + " = " + root.evaluate());
}
}
interface ExpNode {
double evaluate();
}
class NumberExp implements ExpNode {
final double num;
NumberExp(String s) {
num = Double.parseDouble(s);
}
@Override
public double evaluate() {
return num;
}
}
class AddExp implements ExpNode {
final ExpNode left, right;
AddExp(ExpNode a, ExpNode b) {
left = a;
right = b;
}
@Override
public double evaluate() {
return left.evaluate() + right.evaluate();
}
}
class SubExp implements ExpNode {
final ExpNode left, right;
SubExp(ExpNode a, ExpNode b) {
left = a;
right = b;
}
@Override
public double evaluate() {
return left.evaluate() - right.evaluate();
}
}
class MulExp implements ExpNode {
final ExpNode left, right;
MulExp(ExpNode a, ExpNode b) {
left = a;
right = b;
}
@Override
public double evaluate() {
return left.evaluate() * right.evaluate();
}
}
class DivExp implements ExpNode {
final ExpNode left, right;
DivExp(ExpNode a, ExpNode b) {
left = a;
right = b;
}
@Override
public double evaluate() {
return left.evaluate() / right.evaluate();
}
}
class UnaryExp implements ExpNode {
final ExpNode exp;
UnaryExp(ExpNode e) {
exp = e;
}
@Override
public double evaluate() {
return -exp.evaluate();
}
}
and then do:
# generate a lexer & parser java -cp antlr-3.2.jar org.antlr.Tool Exp.g # generate the tree walker java -cp antlr-3.2.jar org.antlr.Tool ExpWalker.g # compile everything javac -cp antlr-3.2.jar *.java # run the main class java -cp .:antlr-3.2.jar Main # *nix java -cp .;antlr-3.2.jar Main # Windows
which prints:
10 - 2 * (3 + 8) = -12.0
You could skip the tree-walker and mix all the code and returns [...]
inside your combined grammar, but IMO, a tree grammar keeps things more orderly because the lexer rules, and tokens like (
and )
etc. are removed from it.
HTH
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With