Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mapping ANTLR parse rules to custom Java AST classes for code generation

I seem to be struggling with the AST->StringTemplate side of things, probably cause I'm coming from writing parsers by hand -> LLVM.

What I'm looking for is a way to automatically match up a parsing rule to an AST class that can represent it and contains a method to generate the target language output. (probably using StringTemplate, in this case.)

In pseudo code, given this example grammar:

numberExpression
    : DIGIT+
    ;

I want to have it mapped to this AST class:

class NumberExpressionAST extends BaseAST {
    private double value;

    public NumberExpressionAST(node) {
        this.value = node.value;
    }

    public String generateCode() {
        // However we want to generate the output.
        // Maybe use a template, maybe string literals, maybe cure cancer...whatever.
    }
}

To mate them up, maybe there would be some glue like below: (or you could go crazy with Class.forName stuff)

switch (child.ruleName) {
    case 'numberExpression':
        return new NumberExpressionAST(child);
        break;
}

I've been scouring the web and I found parse rewrite rules in the grammar with -> but I can't seem to figure out how to keep all this logic out of the grammar. Especially the code to setup and generate the target output from the template. I'm OK with walking the tree multiple times.

I thought that maybe I could use the option output=AST and then maybe provide my own AST classes extending from the CommonTree? I'll admit, my grasp on ANTLR is very primitive, so forgive my ignorance. Every tutorial I follow shows doing all this stuff inline with the grammar which to me is totally insane and hard to maintain.

Can someone point me to a way of accomplishing something similar?

Goal: keep AST/codegen/template logic out of the grammar.

EDIT ---------------------------------------------

I've resorted to tracing through ANTLR's actual source code (since they use themselves) and I'm seeing similar things like BlockAST, RuleAST, etc all inheriting from CommonTree. I haven't quite figured out the important part...how they're using them..

From looking around, I noticed you can basically type hint tokens:

identifier
    : IDENTIFIER<AnyJavaClassIWantAST>
    ;

You can't do exactly the same for parse rules...but if you create some token to represent the parse rule as a whole, you can use rewrite rules like so:

declaration
    : type identifier -> SOME_PARSE_RULE<AnyJavaClassIWantAST>
    ;

All this is closer to what I want, but ideally I shouldn't have to litter the grammar...is there any way to put these somewhere else?

like image 507
jayphelps Avatar asked Nov 27 '12 04:11

jayphelps


People also ask

What is AST ANTLR?

ANTLR helps you build intermediate form trees, or abstract syntax trees (ASTs), by providing grammar annotations that indicate what tokens are to be treated as subtree roots, which are to be leaves, and which are to be ignored with respect to tree construction.

What does ANTLR generate?

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.

How does ANTLR parser work?

ANTLR (ANother Tool for Language Recognition) is a tool for processing structured text. It does this by giving us access to language processing primitives like lexers, grammars, and parsers as well as the runtime to process text against them. It's often used to build tools and frameworks.

What is Lexer and parser in ANTLR?

A lexer (often called a scanner) breaks up an input stream of characters into vocabulary symbols for a parser, which applies a grammatical structure to that symbol stream.


1 Answers

Could you add this as an answer...

Here is a contrived example that uses a handful of ANTLR4's features that go a long way towards separating the grammar from the output language, mainly the alternative labels and the generated listener. This example grammar can represent a few trivial bits of code, but it does so with no language references -- not even a call to skip() for whitespace in the lexer. The test class converts the input to some Java output using the generated listener.

I avoided using anything that I couldn't get to work on the first couple of tries, so don't consider this an exhaustive example by any means.

Simplang.g

grammar Simplang;


compilationUnit : statements EOF;
statements      : statement+;
statement       : block #BlockStatement 
                | call  #CallStatement
                | decl  #DeclStatement
                ;
block           : LCUR statements RCUR;    
call            : methodName LPAR args=arglist? RPAR SEMI;
methodName      : ID;
arglist         : arg (COMMA arg)*;
arg             : expr;    
decl            : VAR variableName EQ expr SEMI;
variableName    : ID;
expr            : add_expr;     
    
add_expr        : lhs=primary_expr (add_op rhs=primary_expr)*;
add_op          : PLUS | MINUS;    
primary_expr    : string=STRING
                | id=ID
                | integer=INT
                ;    
    
VAR: 'var';   
ID: ('a'..'z'|'A'..'Z')+;
INT: ('0'..'9')+;
STRING: '\'' ~('\r'|'\n'|'\'')* '\'';
SEMI: ';';
LPAR: '(';
RPAR: ')';
LCUR: '{';
RCUR: '}';
PLUS: '+';
MINUS: '-';    
COMMA: ',';
EQ: '=';
WS: (' '|'\t'|'\f'|'\r'|'\n') -> skip;

Along with the lexer and parser, ANTLR4 generates a listener interface and default empty implementing class. Here's the interface generated for the grammar above.

SimplangListener.java

public interface SimplangListener extends ParseTreeListener {
    void enterArglist(SimplangParser.ArglistContext ctx);
    void exitArglist(SimplangParser.ArglistContext ctx);
    void enterCall(SimplangParser.CallContext ctx);
    void exitCall(SimplangParser.CallContext ctx);
    void enterCompilationUnit(SimplangParser.CompilationUnitContext ctx);
    void exitCompilationUnit(SimplangParser.CompilationUnitContext ctx);
    void enterVariableName(SimplangParser.VariableNameContext ctx);
    void exitVariableName(SimplangParser.VariableNameContext ctx);
    void enterBlock(SimplangParser.BlockContext ctx);
    void exitBlock(SimplangParser.BlockContext ctx);
    void enterExpr(SimplangParser.ExprContext ctx);
    void exitExpr(SimplangParser.ExprContext ctx);
    void enterPrimary_expr(SimplangParser.Primary_exprContext ctx);
    void exitPrimary_expr(SimplangParser.Primary_exprContext ctx);
    void enterAdd_expr(SimplangParser.Add_exprContext ctx);
    void exitAdd_expr(SimplangParser.Add_exprContext ctx);
    void enterArg(SimplangParser.ArgContext ctx);
    void exitArg(SimplangParser.ArgContext ctx);
    void enterAdd_op(SimplangParser.Add_opContext ctx);
    void exitAdd_op(SimplangParser.Add_opContext ctx);
    void enterStatements(SimplangParser.StatementsContext ctx);
    void exitStatements(SimplangParser.StatementsContext ctx);
    void enterBlockStatement(SimplangParser.BlockStatementContext ctx);
    void exitBlockStatement(SimplangParser.BlockStatementContext ctx);
    void enterCallStatement(SimplangParser.CallStatementContext ctx);
    void exitCallStatement(SimplangParser.CallStatementContext ctx);
    void enterMethodName(SimplangParser.MethodNameContext ctx);
    void exitMethodName(SimplangParser.MethodNameContext ctx);
    void enterDeclStatement(SimplangParser.DeclStatementContext ctx);
    void exitDeclStatement(SimplangParser.DeclStatementContext ctx);
    void enterDecl(SimplangParser.DeclContext ctx);
    void exitDecl(SimplangParser.DeclContext ctx);
}

Here's a test class that overrides a few methods in the empty listener and calls the parser.

SimplangTest.java

public class SimplangTest {

    public static void main(String[] args) {

        ANTLRInputStream input = new ANTLRInputStream(
                "var x = 4;\nfoo(x, 10);\nbar(y + 10 - 1, 'x' + 'y' + 'z');");

        SimplangLexer lexer = new SimplangLexer(input);

        SimplangParser parser = new SimplangParser(new CommonTokenStream(lexer));

        parser.addParseListener(new SimplangBaseListener() {
            public void exitArg(SimplangParser.ArgContext ctx) {
                System.out.print(", ");
            }

            public void exitCall(SimplangParser.CallContext call) {
                System.out.print("})");
            }

            public void exitMethodName(SimplangParser.MethodNameContext ctx) {
                System.out.printf("call(\"%s\", new Object[]{", ctx.ID()
                        .getText());
            }

            public void exitCallStatement(SimplangParser.CallStatementContext ctx) {
                System.out.println(";");
            }

            public void enterDecl(SimplangParser.DeclContext ctx) {
                System.out.print("define(");
            }

            public void exitVariableName(SimplangParser.VariableNameContext ctx) {
                System.out.printf("\"%s\", ", ctx.ID().getText());
            }

            public void exitDeclStatement(SimplangParser.DeclStatementContext ctx) {
                System.out.println(");");
            }

            public void exitAdd_op(SimplangParser.Add_opContext ctx) {
                if (ctx.MINUS() != null) {
                    System.out.print(" - ");
                } else {
                    System.out.print(" + ");
                }
            }

            public void exitPrimary_expr(SimplangParser.Primary_exprContext ctx) {
                if (ctx.string != null) {
                    String value = ctx.string.getText();
                    System.out.printf("\"%s\"",
                            value.subSequence(1, value.length() - 1));
                } else if (ctx.altNum == 2){    //cheating and using the alt# for "INT"
                    System.out.printf("read(\"%s\")", ctx.id.getText());
                } else {
                    System.out.print(ctx.INT().getText());
                }
            }
        });

        parser.compilationUnit();
    }
}

Here's the test input hard-coded in the test class:

var x = 4;
foo(x, 10);
bar(y + 10 - 1, 'x' + 'y' + 'z');

Here's the output produced:

define("x", 4);
call("foo", new Object[]{read("x"), 10, });
call("bar", new Object[]{read("y") + 10 - 1, "x" + "y" + "z", });

It's a silly example, but it shows a few of the features that might be useful to you when building a custom AST.

like image 107
user1201210 Avatar answered Oct 06 '22 02:10

user1201210