Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using ANTLR 3.3?

Tags:

c#

parsing

antlr

I'm trying to get started with ANTLR and C# but I'm finding it extraordinarily difficult due to the lack of documentation/tutorials. I've found a couple half-hearted tutorials for older versions, but it seems there have been some major changes to the API since.

Can anyone give me a simple example of how to create a grammar and use it in a short program?

I've finally managed to get my grammar file compiling into a lexer and parser, and I can get those compiled and running in Visual Studio (after having to recompile the ANTLR source because the C# binaries seem to be out of date too! -- not to mention the source doesn't compile without some fixes), but I still have no idea what to do with my parser/lexer classes. Supposedly it can produce an AST given some input...and then I should be able to do something fancy with that.

like image 870
mpen Avatar asked Dec 09 '10 08:12

mpen


People also ask

How does an ANTLR work?

ANTLR (ANother Tool for Language Recognition) is a tool for processing structured text. It does this by giving us access to language processing primitives like lexers, grammars, and parsers as well as the runtime to process text against them. It's often used to build tools and frameworks.

What is the latest version of ANTLR?

The latest version of ANTLR is 4.10. 1, released April 15, 2022.

Is ANTLR LL or LR?

In computer-based language recognition, ANTLR (pronounced antler), or ANother Tool for Language Recognition, is a parser generator that uses LL(*) for parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set (PCCTS), first developed in 1989, and is under active development.


2 Answers

Let's say you want to parse simple expressions consisting of the following tokens:

  • - subtraction (also unary);
  • + addition;
  • * multiplication;
  • / division;
  • (...) grouping (sub) expressions;
  • integer and decimal numbers.

An ANTLR grammar could look like this:

grammar Expression;  options {   language=CSharp2; }  parse   :  exp EOF    ;  exp   :  addExp   ;  addExp   :  mulExp (('+' | '-') mulExp)*   ;  mulExp   :  unaryExp (('*' | '/') unaryExp)*   ;  unaryExp   :  '-' atom    |  atom   ;  atom   :  Number   |  '(' exp ')'    ;  Number   :  ('0'..'9')+ ('.' ('0'..'9')+)?   ; 

Now to create a proper AST, you add output=AST; in your options { ... } section, and you mix some "tree operators" in your grammar defining which tokens should be the root of a tree. There are two ways to do this:

  1. add ^ and ! after your tokens. The ^ causes the token to become a root and the ! excludes the token from the ast;
  2. by using "rewrite rules": ... -> ^(Root Child Child ...).

Take the rule foo for example:

foo   :  TokenA TokenB TokenC TokenD   ; 

and let's say you want TokenB to become the root and TokenA and TokenC to become its children, and you want to exclude TokenD from the tree. Here's how to do that using option 1:

foo   :  TokenA TokenB^ TokenC TokenD!   ; 

and here's how to do that using option 2:

foo   :  TokenA TokenB TokenC TokenD -> ^(TokenB TokenA TokenC)   ; 

So, here's the grammar with the tree operators in it:

grammar Expression;  options {   language=CSharp2;   output=AST; }  tokens {   ROOT;   UNARY_MIN; }  @parser::namespace { Demo.Antlr } @lexer::namespace { Demo.Antlr }  parse   :  exp EOF -> ^(ROOT exp)   ;  exp   :  addExp   ;  addExp   :  mulExp (('+' | '-')^ mulExp)*   ;  mulExp   :  unaryExp (('*' | '/')^ unaryExp)*   ;  unaryExp   :  '-' atom -> ^(UNARY_MIN atom)   |  atom   ;  atom   :  Number   |  '(' exp ')' -> exp   ;  Number   :  ('0'..'9')+ ('.' ('0'..'9')+)?   ;  Space    :  (' ' | '\t' | '\r' | '\n'){Skip();}   ; 

I also added a Space rule to ignore any white spaces in the source file and added some extra tokens and namespaces for the lexer and parser. Note that the order is important (options { ... } first, then tokens { ... } and finally the @... {}-namespace declarations).

That's it.

Now generate a lexer and parser from your grammar file:

 java -cp antlr-3.2.jar org.antlr.Tool Expression.g 

and put the .cs files in your project together with the C# runtime DLL's.

You can test it using the following class:

using System; using Antlr.Runtime; using Antlr.Runtime.Tree; using Antlr.StringTemplate;  namespace Demo.Antlr {   class MainClass   {     public static void Preorder(ITree Tree, int Depth)      {       if(Tree == null)       {         return;       }        for (int i = 0; i < Depth; i++)       {         Console.Write("  ");       }        Console.WriteLine(Tree);        Preorder(Tree.GetChild(0), Depth + 1);       Preorder(Tree.GetChild(1), Depth + 1);     }      public static void Main (string[] args)     {       ANTLRStringStream Input = new ANTLRStringStream("(12.5 + 56 / -7) * 0.5");        ExpressionLexer Lexer = new ExpressionLexer(Input);       CommonTokenStream Tokens = new CommonTokenStream(Lexer);       ExpressionParser Parser = new ExpressionParser(Tokens);       ExpressionParser.parse_return ParseReturn = Parser.parse();       CommonTree Tree = (CommonTree)ParseReturn.Tree;       Preorder(Tree, 0);     }   } } 

which produces the following output:

 ROOT   *     +       12.5       /         56         UNARY_MIN           7     0.5 

which corresponds to the following AST:

alt text

(diagram created using graph.gafol.net)

Note that ANTLR 3.3 has just been released and the CSharp target is "in beta". That's why I used ANTLR 3.2 in my example.

In case of rather simple languages (like my example above), you could also evaluate the result on the fly without creating an AST. You can do that by embedding plain C# code inside your grammar file, and letting your parser rules return a specific value.

Here's an example:

grammar Expression;  options {   language=CSharp2; }  @parser::namespace { Demo.Antlr } @lexer::namespace { Demo.Antlr }  parse returns [double value]   :  exp EOF {$value = $exp.value;}   ;  exp returns [double value]   :  addExp {$value = $addExp.value;}   ;  addExp returns [double value]   :  a=mulExp       {$value = $a.value;}      ( '+' b=mulExp {$value += $b.value;}      | '-' b=mulExp {$value -= $b.value;}      )*   ;  mulExp returns [double value]   :  a=unaryExp       {$value = $a.value;}      ( '*' b=unaryExp {$value *= $b.value;}      | '/' b=unaryExp {$value /= $b.value;}      )*   ;  unaryExp returns [double value]   :  '-' atom {$value = -1.0 * $atom.value;}   |  atom     {$value = $atom.value;}   ;  atom returns [double value]   :  Number      {$value = Double.Parse($Number.Text, CultureInfo.InvariantCulture);}   |  '(' exp ')' {$value = $exp.value;}   ;  Number   :  ('0'..'9')+ ('.' ('0'..'9')+)?   ;  Space    :  (' ' | '\t' | '\r' | '\n'){Skip();}   ; 

which can be tested with the class:

using System; using Antlr.Runtime; using Antlr.Runtime.Tree; using Antlr.StringTemplate;  namespace Demo.Antlr {   class MainClass   {     public static void Main (string[] args)     {       string expression = "(12.5 + 56 / -7) * 0.5";       ANTLRStringStream Input = new ANTLRStringStream(expression);         ExpressionLexer Lexer = new ExpressionLexer(Input);       CommonTokenStream Tokens = new CommonTokenStream(Lexer);       ExpressionParser Parser = new ExpressionParser(Tokens);       Console.WriteLine(expression + " = " + Parser.parse());     }   } } 

and produces the following output:

 (12.5 + 56 / -7) * 0.5 = 2.25 

EDIT

In the comments, Ralph wrote:

Tip for those using Visual Studio: you can put something like java -cp "$(ProjectDir)antlr-3.2.jar" org.antlr.Tool "$(ProjectDir)Expression.g" in the pre-build events, then you can just modify your grammar and run the project without having to worry about rebuilding the lexer/parser.

like image 149
Bart Kiers Avatar answered Sep 17 '22 19:09

Bart Kiers


Have you looked at Irony.net? It's aimed at .Net and therefore works really well, has proper tooling, proper examples and just works. The only problem is that it is still a bit 'alpha-ish' so documentation and versions seem to change a bit, but if you just stick with a version, you can do nifty things.

p.s. sorry for the bad answer where you ask a problem about X and someone suggests something different using Y ;^)

like image 38
Toad Avatar answered Sep 20 '22 19:09

Toad