Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ANTLR Grammar if Statement

Tags:

antlr

I have been working on learning ANTLR in order to create a domain specific language. One of the requirements is to translate this DSL into C. I have been able to get a basic grammar that recognizes the DSL, however I am having issues translating this to C. Mainly, my problem comes from trying to translate the DSL if statement into a C if statement. I have tried using print statements in the grammar, to no avail (I am using C#).

Here is the grammar I have been testing with:

**ifTest.g**
grammar ifTest;

options
{
backtrack=true;
output=AST;
language=CSharp2;
}

/*************************
PARSER RULES
*************************/
prog    :   lambda
|   statements EOF;

lambda  :   /* Empty */;

statements
:   statement+;

statement
:   logical
|   assignment
|   NEWLINE;


logical :   IF a=logical_Expr THEN b=statements 
        {
            System.Console.Write("\tif (" + $a.text + ")\n\t{\n\t" + "\t" +     $b.text + "\n\n\t}");   
        }
        ( ELSE c=statements      
       {    
        System.Console.Write("\n\telse {\n\t\t\t" + $c.text + "\n\t}"); 
    } )?
    ENDIF   
    {
        System.Console.Write("\n}");
    }
;

logical_Expr
    :   expr    
    ;

expr    :   (simple_Expr) (op expr)*
    ;

simple_Expr     : MINUS expr
    | identifier
    | number
    ;

identifier  : parameter
    | VARIABLE
    ;

parameter   : norm_parameter
    ;

norm_parameter  : spec_label
    | reserved_parm
    ;

spec_label  : LABEL
                ;

reserved_parm   : RES_PARM
                ;

op  :   PLUS
|   MINUS
|   MULT
|   DIV
|   EQUALS
|   GT
|   LT
|   GE
|   LE
;

number      : INT
    | FLOAT
    | HEX
                ;

assignment  : identifier GETS expr
;

/*************************
    LEXER RULES
*************************/
WS  :       (' '|'\t')+ {$channel=HIDDEN;};

COMMENT :   '/*' (options {greedy=false;}:.)* '*/' {$channel=HIDDEN;}
                ;

LINECOMMENT
    :   '#' ~('\n'|'\r')* NEWLINE {$channel=HIDDEN;}
    ;

NEWLINE :   '\r'?'\n' {$channel=HIDDEN;};

IF  :   I F;
THEN    :   T H E N;
ELSE    :   E L S E;
ENDIF   :   E N D I F;

PLUS    :   '+';
MINUS   :   '-';
MULT    :   '*';
DIV :   '/';
EQUALS  :   '=';
GT  :   '>';
LT  :   '<';
GE  :   '>=';
LE  :   '<=';
ULINE   :   '_';
DOT :   '.';
GETS    :   ':=';

LABEL   :   (LETTER|ULINE)(LETTER|DIGIT|ULINE)*;

INT     :   '-'?DIGIT+;

FLOAT   :   '-'? DIGIT* DOT DIGIT+;

HEX :   ('0x'|'0X')(HEXDIGIT)HEXDIGIT*;

RES_PARM:    DIGIT LABEL;

VARIABLE:    '\$' LABEL;


fragment A:'A'|'a';    fragment B:'B'|'b';    fragment C:'C'|'c';    fragment D:'D'|'d';    
fragment E:'E'|'e';    fragment F:'F'|'f';    fragment G:'G'|'g';    fragment H:'H'|'h';    
fragment I:'I'|'i';    fragment J:'J'|'j';    fragment K:'K'|'k';    fragment L:'L'|'l';
fragment M:'M'|'m';    fragment N:'N'|'n';    fragment O:'O'|'o';    fragment P:'P'|'p';    
fragment Q:'Q'|'q';    fragment R:'R'|'r';    fragment S:'S'|'s';    fragment T:'T'|'t';    
fragment U:'U'|'u';    fragment V:'V'|'v';    fragment W:'W'|'w';    fragment X:'X'|'x';
fragment Y:'Y'|'y';    fragment Z:'Z'|'z';


fragment DIGIT
:   '0'..'9';

fragment LETTER
:   A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z;

fragment HEXDIGIT   
:   '0..9'|'a..f'|'A'..'F';

When testing this with this C# class

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Antlr.Runtime;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            string inputString = "if $variable1 = 0 then\n  if $variable2 > 250 then\n   $variable3 := 0\n  endif\n endif";

            Console.WriteLine("Here is the input string:\n " + inputString + "\n");

            ANTLRStringStream input = new ANTLRStringStream(inputString);

            ifTestLexer lexer = new ifTestLexer(input);

            CommonTokenStream tokens = new CommonTokenStream(lexer);

            ifTestParser parser = new ifTestParser(tokens);

            parser.prog();

            Console.Read();
        }
    }
}

The output is not quite how I imagined.

**Output**
if ($variable2 > 250)
    {
            $variable3 := 0

    }
}       if ($variable1 = 0)
    {
            if $variable2 > 250 then
           $variable3 := 0
           endif

    }
}

The problem seems to be that the second if statement is printing twice, but not in the order I was hoping. I assume it has to do with me simply trying to emit the statements block within the print statements, but I am not quite sure how to go about getting this to work properly. I have been reading up on StringTemplate, or creating an AST and using a Tree Walker to walk it, but is there anyway to fix the above output to look something like this?

if ($variable1 = 0)
{
    if ($variable2 > 250)
    {
         $variable3 := 0
    }
}

Any help on which direction I should be taking would be greatly appreciated. Would it be better for me to take the leap to StringTemplate, or is there some way for me to do this with basic action code? If I left any information out, please feel free to ask.

like image 370
almostProgramming Avatar asked Feb 16 '12 15:02

almostProgramming


2 Answers

Yes, the problem is that you're trying to emit your 'compilation results' (a C program) during the parsing stage. The parser will backtrack and in general you cannot expect each section of the parser to only run once and to take the correct path each time.

The AST output is definitely what I would suggest looking at, and then walking the AST to produce your output. TreeWalker certainly sounds like a useful tool.

Overall, no, I do not believe that for any non-trivial grammar it could be possible to create your desired output with only parse actions.

Oddly enough, you're the second person I've seen try and do this in the past couple days. I can certainly see the attraction of the idea "do it all with the parser!", but I really don't think it's feasible. ANTLR's a heck of a tool, but its output is an AST; not a compiled executable.

Here's a link to the other similar question if you're interested:
Parsing Java code with ANTLR "need concept"

like image 190
Task Avatar answered Nov 15 '22 10:11

Task


If you remove the backtracking, which is easily done in your case, you can let the parser build the C code immediately.

Note that parser rules can take parameters (the indentation level in my example below) and can return custom objects (Strings in the example):

Here's your grammar without backtracking and outputting to C code (I'm not too good at C#, so the demo is in Java):

grammar ifTest;

prog    
 : statements[""] EOF {System.out.println($statements.str);}
 ;

statements[String indent] returns [String str]
@init{$str = "";}
 : (statement[indent] {$str += indent + $statement.str + "\n";})*
 ;

statement[String indent] returns [String str]
 : if_statement[indent] {$str = $if_statement.str;}
 | assignment           {$str = $assignment.str;}
 ;

if_statement[String indent] returns [String str]
 : IF expr THEN s1=statements[indent + "  "] {$str = "if (" + $expr.str + ")\n" + indent + "{\n" + $s1.str;}
   (ELSE s2=statements[indent + "  "]        {$str += indent + "}\n" + indent + "else\n" + indent + "{\n" + $s2.str;} )? 
   ENDIF                                     {$str += indent + "}";}
 ;

assignment returns [String str]
 : identifier GETS expr {$str = $identifier.str + " = " + $expr.str + ";";}
 ;

expr returns [String str]
 : rel_expr {$str = $rel_expr.str;}
 ;

rel_expr returns [String str]
 : e1=eq_expr {$str = $e1.str;} ( LT e2=eq_expr {$str += " < "  + $e2.str;}
                                | GT e2=eq_expr {$str += " > "  + $e2.str;}
                                | LE e2=eq_expr {$str += " <= " + $e2.str;}
                                | GE e2=eq_expr {$str += " >= " + $e2.str;}
                                )?
 ;

eq_expr returns [String str]
 : e1=add_expr {$str = $e1.str;} (EQUALS e2=add_expr {$str += " == " + $e2.str;})?
 ;

add_expr returns [String str]
 : e1=mult_expr {$str = $e1.str;} ( PLUS  e2=mult_expr {$str += " + " + $e2.str;}
                                  | MINUS e2=mult_expr {$str += " - " + $e2.str;}
                                  )*
 ;

mult_expr returns [String str]
 : e1=unary_expr {$str = $e1.str;} ( MULT e2=unary_expr {$str += " * " + $e2.str;}
                                   | DIV  e2=unary_expr {$str += " / " + $e2.str;}
                                   )*
 ;

unary_expr returns [String str]
 : MINUS term {$str = "-" + $term.str;}
 | term       {$str = $term.str;}
 ;

term returns [String str]
 : identifier {$str = $identifier.str;}
 | number     {$str = $number.text;}
 ;

identifier returns [String str]
 : LABEL    {$str = $LABEL.text;}
 | RES_PARM {$str = $RES_PARM.text;}
 | VARIABLE {$str = $VARIABLE.text.substring(1);}
 ;

number
 : INT   
 | FLOAT
 | HEX
 ;

WS          : (' '|'\t')+ {$channel=HIDDEN;};
COMMENT     : '/*' .* '*/' {$channel=HIDDEN;};
LINECOMMENT : '#' ~('\n'|'\r')* NEWLINE {$channel=HIDDEN;};
NEWLINE     : '\r'?'\n' {$channel=HIDDEN;};
IF          : I F;
THEN        : T H E N;
ELSE        : E L S E;
ENDIF       : E N D I F;
PLUS        : '+';
MINUS       : '-';
MULT        : '*';
DIV         : '/';
EQUALS      : '=';
GT          : '>';
LT          : '<';
GE          : '>=';
LE          : '<=';
ULINE       : '_';
DOT         : '.';
GETS        : ':=';
LABEL       : (LETTER | ULINE) (LETTER | DIGIT | ULINE)*;
INT         : DIGIT+;            // no '-' here, unary_expr handles this
FLOAT       : DIGIT* DOT DIGIT+; // no '-' here, unary_expr handles this
HEX         : '0' ('x'|'X') HEXDIGIT+;
RES_PARM    : DIGIT LABEL;
VARIABLE    : '$' LABEL;

fragment A:'A'|'a';    fragment B:'B'|'b';    fragment C:'C'|'c';    fragment D:'D'|'d';    
fragment E:'E'|'e';    fragment F:'F'|'f';    fragment G:'G'|'g';    fragment H:'H'|'h';    
fragment I:'I'|'i';    fragment J:'J'|'j';    fragment K:'K'|'k';    fragment L:'L'|'l';
fragment M:'M'|'m';    fragment N:'N'|'n';    fragment O:'O'|'o';    fragment P:'P'|'p';    
fragment Q:'Q'|'q';    fragment R:'R'|'r';    fragment S:'S'|'s';    fragment T:'T'|'t';    
fragment U:'U'|'u';    fragment V:'V'|'v';    fragment W:'W'|'w';    fragment X:'X'|'x';
fragment Y:'Y'|'y';    fragment Z:'Z'|'z';

fragment HEXDIGIT : DIGIT |'a..f'|'A'..'F';
fragment DIGIT    : '0'..'9';
fragment LETTER   : A | B | C | D | E | F | G | H | I | J | K | L | M 
                  | N | O | P | Q | R | S | T | U | V | W | X | Y | Z
                  ;

If you now test your parser with the input:

if $variable1 = 0 then
  if $variable2 > 250 then
    $variable3 := 0
  else
    $variable3 := 42
  endif
endif

the following is printed to the console:

if (variable1 == 0)
{
  if (variable2 > 250)
  {
    variable3 = 0;
  }
  else
  {
    variable3 = 42;
  }
}

If other parts of your grammar rely (heavily) on predicates (backtracking), the same strategy as above could just as easily be applied but then in a tree grammar (so after the backtracking-parser did its job and produced an AST).

like image 45
Bart Kiers Avatar answered Nov 15 '22 08:11

Bart Kiers