I'm trying to get started with ANTLR and C# but I'm finding it extraordinarily difficult due to the lack of documentation/tutorials. I've found a couple half-hearted tutorials for older versions, but it seems there have been some major changes to the API since.
Can anyone give me a simple example of how to create a grammar and use it in a short program?
I've finally managed to get my grammar file compiling into a lexer and parser, and I can get those compiled and running in Visual Studio (after having to recompile the ANTLR source because the C# binaries seem to be out of date too! -- not to mention the source doesn't compile without some fixes), but I still have no idea what to do with my parser/lexer classes. Supposedly it can produce an AST given some input...and then I should be able to do something fancy with that.
ANTLR (ANother Tool for Language Recognition) is a tool for processing structured text. It does this by giving us access to language processing primitives like lexers, grammars, and parsers as well as the runtime to process text against them. It's often used to build tools and frameworks.
The latest version of ANTLR is 4.10. 1, released April 15, 2022.
In computer-based language recognition, ANTLR (pronounced antler), or ANother Tool for Language Recognition, is a parser generator that uses LL(*) for parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set (PCCTS), first developed in 1989, and is under active development.
Let's say you want to parse simple expressions consisting of the following tokens:
-
subtraction (also unary);+
addition;*
multiplication;/
division;(...)
grouping (sub) expressions;An ANTLR grammar could look like this:
grammar Expression; options { language=CSharp2; } parse : exp EOF ; exp : addExp ; addExp : mulExp (('+' | '-') mulExp)* ; mulExp : unaryExp (('*' | '/') unaryExp)* ; unaryExp : '-' atom | atom ; atom : Number | '(' exp ')' ; Number : ('0'..'9')+ ('.' ('0'..'9')+)? ;
Now to create a proper AST, you add output=AST;
in your options { ... }
section, and you mix some "tree operators" in your grammar defining which tokens should be the root of a tree. There are two ways to do this:
^
and !
after your tokens. The ^
causes the token to become a root and the !
excludes the token from the ast;... -> ^(Root Child Child ...)
.Take the rule foo
for example:
foo : TokenA TokenB TokenC TokenD ;
and let's say you want TokenB
to become the root and TokenA
and TokenC
to become its children, and you want to exclude TokenD
from the tree. Here's how to do that using option 1:
foo : TokenA TokenB^ TokenC TokenD! ;
and here's how to do that using option 2:
foo : TokenA TokenB TokenC TokenD -> ^(TokenB TokenA TokenC) ;
So, here's the grammar with the tree operators in it:
grammar Expression; options { language=CSharp2; output=AST; } tokens { ROOT; UNARY_MIN; } @parser::namespace { Demo.Antlr } @lexer::namespace { Demo.Antlr } parse : exp EOF -> ^(ROOT exp) ; exp : addExp ; addExp : mulExp (('+' | '-')^ mulExp)* ; mulExp : unaryExp (('*' | '/')^ unaryExp)* ; unaryExp : '-' atom -> ^(UNARY_MIN atom) | atom ; atom : Number | '(' exp ')' -> exp ; Number : ('0'..'9')+ ('.' ('0'..'9')+)? ; Space : (' ' | '\t' | '\r' | '\n'){Skip();} ;
I also added a Space
rule to ignore any white spaces in the source file and added some extra tokens and namespaces for the lexer and parser. Note that the order is important (options { ... }
first, then tokens { ... }
and finally the @... {}
-namespace declarations).
That's it.
Now generate a lexer and parser from your grammar file:
java -cp antlr-3.2.jar org.antlr.Tool Expression.g
and put the .cs
files in your project together with the C# runtime DLL's.
You can test it using the following class:
using System; using Antlr.Runtime; using Antlr.Runtime.Tree; using Antlr.StringTemplate; namespace Demo.Antlr { class MainClass { public static void Preorder(ITree Tree, int Depth) { if(Tree == null) { return; } for (int i = 0; i < Depth; i++) { Console.Write(" "); } Console.WriteLine(Tree); Preorder(Tree.GetChild(0), Depth + 1); Preorder(Tree.GetChild(1), Depth + 1); } public static void Main (string[] args) { ANTLRStringStream Input = new ANTLRStringStream("(12.5 + 56 / -7) * 0.5"); ExpressionLexer Lexer = new ExpressionLexer(Input); CommonTokenStream Tokens = new CommonTokenStream(Lexer); ExpressionParser Parser = new ExpressionParser(Tokens); ExpressionParser.parse_return ParseReturn = Parser.parse(); CommonTree Tree = (CommonTree)ParseReturn.Tree; Preorder(Tree, 0); } } }
which produces the following output:
ROOT * + 12.5 / 56 UNARY_MIN 7 0.5
which corresponds to the following AST:
(diagram created using graph.gafol.net)
Note that ANTLR 3.3 has just been released and the CSharp target is "in beta". That's why I used ANTLR 3.2 in my example.
In case of rather simple languages (like my example above), you could also evaluate the result on the fly without creating an AST. You can do that by embedding plain C# code inside your grammar file, and letting your parser rules return a specific value.
Here's an example:
grammar Expression; options { language=CSharp2; } @parser::namespace { Demo.Antlr } @lexer::namespace { Demo.Antlr } parse returns [double value] : exp EOF {$value = $exp.value;} ; exp returns [double value] : addExp {$value = $addExp.value;} ; addExp returns [double value] : a=mulExp {$value = $a.value;} ( '+' b=mulExp {$value += $b.value;} | '-' b=mulExp {$value -= $b.value;} )* ; mulExp returns [double value] : a=unaryExp {$value = $a.value;} ( '*' b=unaryExp {$value *= $b.value;} | '/' b=unaryExp {$value /= $b.value;} )* ; unaryExp returns [double value] : '-' atom {$value = -1.0 * $atom.value;} | atom {$value = $atom.value;} ; atom returns [double value] : Number {$value = Double.Parse($Number.Text, CultureInfo.InvariantCulture);} | '(' exp ')' {$value = $exp.value;} ; Number : ('0'..'9')+ ('.' ('0'..'9')+)? ; Space : (' ' | '\t' | '\r' | '\n'){Skip();} ;
which can be tested with the class:
using System; using Antlr.Runtime; using Antlr.Runtime.Tree; using Antlr.StringTemplate; namespace Demo.Antlr { class MainClass { public static void Main (string[] args) { string expression = "(12.5 + 56 / -7) * 0.5"; ANTLRStringStream Input = new ANTLRStringStream(expression); ExpressionLexer Lexer = new ExpressionLexer(Input); CommonTokenStream Tokens = new CommonTokenStream(Lexer); ExpressionParser Parser = new ExpressionParser(Tokens); Console.WriteLine(expression + " = " + Parser.parse()); } } }
and produces the following output:
(12.5 + 56 / -7) * 0.5 = 2.25
In the comments, Ralph wrote:
Tip for those using Visual Studio: you can put something like
java -cp "$(ProjectDir)antlr-3.2.jar" org.antlr.Tool "$(ProjectDir)Expression.g"
in the pre-build events, then you can just modify your grammar and run the project without having to worry about rebuilding the lexer/parser.
Have you looked at Irony.net? It's aimed at .Net and therefore works really well, has proper tooling, proper examples and just works. The only problem is that it is still a bit 'alpha-ish' so documentation and versions seem to change a bit, but if you just stick with a version, you can do nifty things.
p.s. sorry for the bad answer where you ask a problem about X and someone suggests something different using Y ;^)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With