Add the package name that you want to see in the Java file in which the lexer and parser files will be created. Add the Language in which you want the output like Java , Python etc. Tick the generate parser tree listener and generate tree visitor if you want to modify the visitor. Now the configuration is done.
ANTLR (ANother Tool for Language Recognition) is a tool for processing structured text. It does this by giving us access to language processing primitives like lexers, grammars, and parsers as well as the runtime to process text against them. It's often used to build tools and frameworks.
What you need to do to get a parse tree: define a lexer and parser grammar. invoke ANTLR: it will generate a lexer and a parser in your target language (e.g., Java, Python, C#, JavaScript) use the generated lexer and parser: you invoke them passing the code to recognize and they return to you a parse tree.
ANTLR is a powerful parser generator that you can use to read, process, execute, or translate structured text or binary files. It's widely used in academia and industry to build all sorts of languages, tools, and frameworks.
Note: this answer is for ANTLR3! If you're looking for an ANTLR4 example, then this Q&A demonstrates how to create a simple expression parser, and evaluator using ANTLR4.
You first create a grammar. Below is a small grammar that you can use to evaluate expressions that are built using the 4 basic math operators: +, -, * and /. You can also group expressions using parenthesis.
Note that this grammar is just a very basic one: it does not handle unary operators (the minus in: -1+9) or decimals like .99 (without a leading number), to name just two shortcomings. This is just an example you can work on yourself.
Here's the contents of the grammar file Exp.g:
grammar Exp;
/* This will be the entry point of our parser. */
eval
: additionExp
;
/* Addition and subtraction have the lowest precedence. */
additionExp
: multiplyExp
( '+' multiplyExp
| '-' multiplyExp
)*
;
/* Multiplication and division have a higher precedence. */
multiplyExp
: atomExp
( '*' atomExp
| '/' atomExp
)*
;
/* An expression atom is the smallest part of an expression: a number. Or
when we encounter parenthesis, we're making a recursive call back to the
rule 'additionExp'. As you can see, an 'atomExp' has the highest precedence. */
atomExp
: Number
| '(' additionExp ')'
;
/* A number: can be an integer value, or a decimal value */
Number
: ('0'..'9')+ ('.' ('0'..'9')+)?
;
/* We're going to ignore all white space characters */
WS
: (' ' | '\t' | '\r'| '\n') {$channel=HIDDEN;}
;
(Parser rules start with a lower case letter, and lexer rules start with a capital letter)
After creating the grammar, you'll want to generate a parser and lexer from it. Download the ANTLR jar and store it in the same directory as your grammar file.
Execute the following command on your shell/command prompt:
java -cp antlr-3.2.jar org.antlr.Tool Exp.g
It should not produce any error message, and the files ExpLexer.java, ExpParser.java and Exp.tokens should now be generated.
To see if it all works properly, create this test class:
import org.antlr.runtime.*;
public class ANTLRDemo {
public static void main(String[] args) throws Exception {
ANTLRStringStream in = new ANTLRStringStream("12*(5-6)");
ExpLexer lexer = new ExpLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
ExpParser parser = new ExpParser(tokens);
parser.eval();
}
}
and compile it:
// *nix/MacOS
javac -cp .:antlr-3.2.jar ANTLRDemo.java
// Windows
javac -cp .;antlr-3.2.jar ANTLRDemo.java
and then run it:
// *nix/MacOS
java -cp .:antlr-3.2.jar ANTLRDemo
// Windows
java -cp .;antlr-3.2.jar ANTLRDemo
If all goes well, nothing is being printed to the console. This means the parser did not find any error. When you change "12*(5-6)"
into "12*(5-6"
and then recompile and run it, there should be printed the following:
line 0:-1 mismatched input '<EOF>' expecting ')'
Okay, now we want to add a bit of Java code to the grammar so that the parser actually does something useful. Adding code can be done by placing {
and }
inside your grammar with some plain Java code inside it.
But first: all parser rules in the grammar file should return a primitive double value. You can do that by adding returns [double value]
after each rule:
grammar Exp;
eval returns [double value]
: additionExp
;
additionExp returns [double value]
: multiplyExp
( '+' multiplyExp
| '-' multiplyExp
)*
;
// ...
which needs little explanation: every rule is expected to return a double value. Now to "interact" with the return value double value
(which is NOT inside a plain Java code block {...}
) from inside a code block, you'll need to add a dollar sign in front of value
:
grammar Exp;
/* This will be the entry point of our parser. */
eval returns [double value]
: additionExp { /* plain code block! */ System.out.println("value equals: "+$value); }
;
// ...
Here's the grammar but now with the Java code added:
grammar Exp;
eval returns [double value]
: exp=additionExp {$value = $exp.value;}
;
additionExp returns [double value]
: m1=multiplyExp {$value = $m1.value;}
( '+' m2=multiplyExp {$value += $m2.value;}
| '-' m2=multiplyExp {$value -= $m2.value;}
)*
;
multiplyExp returns [double value]
: a1=atomExp {$value = $a1.value;}
( '*' a2=atomExp {$value *= $a2.value;}
| '/' a2=atomExp {$value /= $a2.value;}
)*
;
atomExp returns [double value]
: n=Number {$value = Double.parseDouble($n.text);}
| '(' exp=additionExp ')' {$value = $exp.value;}
;
Number
: ('0'..'9')+ ('.' ('0'..'9')+)?
;
WS
: (' ' | '\t' | '\r'| '\n') {$channel=HIDDEN;}
;
and since our eval
rule now returns a double, change your ANTLRDemo.java into this:
import org.antlr.runtime.*;
public class ANTLRDemo {
public static void main(String[] args) throws Exception {
ANTLRStringStream in = new ANTLRStringStream("12*(5-6)");
ExpLexer lexer = new ExpLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
ExpParser parser = new ExpParser(tokens);
System.out.println(parser.eval()); // print the value
}
}
Again (re) generate a fresh lexer and parser from your grammar (1), compile all classes (2) and run ANTLRDemo (3):
// *nix/MacOS
java -cp antlr-3.2.jar org.antlr.Tool Exp.g // 1
javac -cp .:antlr-3.2.jar ANTLRDemo.java // 2
java -cp .:antlr-3.2.jar ANTLRDemo // 3
// Windows
java -cp antlr-3.2.jar org.antlr.Tool Exp.g // 1
javac -cp .;antlr-3.2.jar ANTLRDemo.java // 2
java -cp .;antlr-3.2.jar ANTLRDemo // 3
and you'll now see the outcome of the expression 12*(5-6)
printed to your console!
Again: this is a very brief explanation. I encourage you to browse the ANTLR wiki and read some tutorials and/or play a bit with what I just posted.
Good luck!
EDIT:
This post shows how to extend the example above so that a Map<String, Double>
can be provided that holds variables in the provided expression.
To get this code working with a current version of Antlr (June 2014) I needed to make a few changes. ANTLRStringStream
needed to become ANTLRInputStream
, the returned value needed to change from parser.eval()
to parser.eval().value
, and I needed to remove the WS
clause at the end, because attribute values such as $channel
are no longer allowed to appear in lexer actions.
ANTLR mega tutorial by Gabriele Tomassetti is very helpful
It has grammar examples, examples of visitors in different languages (Java, JavaScript, C# and Python) and many other things. Highly recommended.
EDIT: other useful articles by Gabriele Tomassetti on ANTLR
For Antlr 4 the java code generation process is below:-
java -cp antlr-4.5.3-complete.jar org.antlr.v4.Tool Exp.g
Update your jar name in classpath accordingly.
version 4.7.1 was slightly different : for import:
import org.antlr.v4.runtime.*;
for the main segment - note the CharStreams:
CharStream in = CharStreams.fromString("12*(5-6)");
ExpLexer lexer = new ExpLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
ExpParser parser = new ExpParser(tokens);
At https://github.com/BITPlan/com.bitplan.antlr you'll find an ANTLR java library with some useful helper classes and a few complete examples. It's ready to be used with maven and if you like eclipse and maven.
https://github.com/BITPlan/com.bitplan.antlr/blob/master/src/main/antlr4/com/bitplan/exp/Exp.g4
is a simple Expression language that can do multiply and add operations. https://github.com/BITPlan/com.bitplan.antlr/blob/master/src/test/java/com/bitplan/antlr/TestExpParser.java has the corresponding unit tests for it.
https://github.com/BITPlan/com.bitplan.antlr/blob/master/src/main/antlr4/com/bitplan/iri/IRIParser.g4 is an IRI parser that has been split into the three parts:
https://github.com/BITPlan/com.bitplan.antlr/blob/master/src/test/java/com/bitplan/antlr/TestIRIParser.java has the unit tests for it.
Personally I found this the most tricky part to get right. See http://wiki.bitplan.com/index.php/ANTLR_maven_plugin
https://github.com/BITPlan/com.bitplan.antlr/tree/master/src/main/antlr4/com/bitplan/expr
contains three more examples that have been created for a performance issue of ANTLR4 in an earlier version. In the meantime this issues has been fixed as the testcase https://github.com/BITPlan/com.bitplan.antlr/blob/master/src/test/java/com/bitplan/antlr/TestIssue994.java shows.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With