Mapping ANTLR parse rules to custom Java AST classes for code generation

Tags:

I seem to be struggling with the AST->StringTemplate side of things, probably cause I'm coming from writing parsers by hand -> LLVM.

What I'm looking for is a way to automatically match up a parsing rule to an AST class that can represent it and contains a method to generate the target language output. (probably using StringTemplate, in this case.)

In pseudo code, given this example grammar:

numberExpression
    : DIGIT+
    ;

I want to have it mapped to this AST class:

class NumberExpressionAST extends BaseAST {
    private double value;

    public NumberExpressionAST(node) {
        this.value = node.value;
    }

    public String generateCode() {
        // However we want to generate the output.
        // Maybe use a template, maybe string literals, maybe cure cancer...whatever.
    }
}

To mate them up, maybe there would be some glue like below: (or you could go crazy with Class.forName stuff)

switch (child.ruleName) {
    case 'numberExpression':
        return new NumberExpressionAST(child);
        break;
}

I've been scouring the web and I found parse rewrite rules in the grammar with -> but I can't seem to figure out how to keep all this logic out of the grammar. Especially the code to setup and generate the target output from the template. I'm OK with walking the tree multiple times.

I thought that maybe I could use the option output=AST and then maybe provide my own AST classes extending from the CommonTree? I'll admit, my grasp on ANTLR is very primitive, so forgive my ignorance. Every tutorial I follow shows doing all this stuff inline with the grammar which to me is totally insane and hard to maintain.

Can someone point me to a way of accomplishing something similar?

Goal: keep AST/codegen/template logic out of the grammar.

EDIT ---------------------------------------------

I've resorted to tracing through ANTLR's actual source code (since they use themselves) and I'm seeing similar things like BlockAST, RuleAST, etc all inheriting from CommonTree. I haven't quite figured out the important part...how they're using them..

From looking around, I noticed you can basically type hint tokens:

identifier
    : IDENTIFIER<AnyJavaClassIWantAST>
    ;

You can't do exactly the same for parse rules...but if you create some token to represent the parse rule as a whole, you can use rewrite rules like so:

declaration
    : type identifier -> SOME_PARSE_RULE<AnyJavaClassIWantAST>
    ;

All this is closer to what I want, but ideally I shouldn't have to litter the grammar...is there any way to put these somewhere else?

507

asked Nov 27 '12 04:11

jayphelps

1 Answers

Could you add this as an answer...

Here is a contrived example that uses a handful of ANTLR4's features that go a long way towards separating the grammar from the output language, mainly the alternative labels and the generated listener. This example grammar can represent a few trivial bits of code, but it does so with no language references -- not even a call to skip() for whitespace in the lexer. The test class converts the input to some Java output using the generated listener.

I avoided using anything that I couldn't get to work on the first couple of tries, so don't consider this an exhaustive example by any means.

Simplang.g

grammar Simplang;


compilationUnit : statements EOF;
statements      : statement+;
statement       : block #BlockStatement 
                | call  #CallStatement
                | decl  #DeclStatement
                ;
block           : LCUR statements RCUR;    
call            : methodName LPAR args=arglist? RPAR SEMI;
methodName      : ID;
arglist         : arg (COMMA arg)*;
arg             : expr;    
decl            : VAR variableName EQ expr SEMI;
variableName    : ID;
expr            : add_expr;     
    
add_expr        : lhs=primary_expr (add_op rhs=primary_expr)*;
add_op          : PLUS | MINUS;    
primary_expr    : string=STRING
                | id=ID
                | integer=INT
                ;    
    
VAR: 'var';   
ID: ('a'..'z'|'A'..'Z')+;
INT: ('0'..'9')+;
STRING: '\'' ~('\r'|'\n'|'\'')* '\'';
SEMI: ';';
LPAR: '(';
RPAR: ')';
LCUR: '{';
RCUR: '}';
PLUS: '+';
MINUS: '-';    
COMMA: ',';
EQ: '=';
WS: (' '|'\t'|'\f'|'\r'|'\n') -> skip;

Along with the lexer and parser, ANTLR4 generates a listener interface and default empty implementing class. Here's the interface generated for the grammar above.

SimplangListener.java

public interface SimplangListener extends ParseTreeListener {
    void enterArglist(SimplangParser.ArglistContext ctx);
    void exitArglist(SimplangParser.ArglistContext ctx);
    void enterCall(SimplangParser.CallContext ctx);
    void exitCall(SimplangParser.CallContext ctx);
    void enterCompilationUnit(SimplangParser.CompilationUnitContext ctx);
    void exitCompilationUnit(SimplangParser.CompilationUnitContext ctx);
    void enterVariableName(SimplangParser.VariableNameContext ctx);
    void exitVariableName(SimplangParser.VariableNameContext ctx);
    void enterBlock(SimplangParser.BlockContext ctx);
    void exitBlock(SimplangParser.BlockContext ctx);
    void enterExpr(SimplangParser.ExprContext ctx);
    void exitExpr(SimplangParser.ExprContext ctx);
    void enterPrimary_expr(SimplangParser.Primary_exprContext ctx);
    void exitPrimary_expr(SimplangParser.Primary_exprContext ctx);
    void enterAdd_expr(SimplangParser.Add_exprContext ctx);
    void exitAdd_expr(SimplangParser.Add_exprContext ctx);
    void enterArg(SimplangParser.ArgContext ctx);
    void exitArg(SimplangParser.ArgContext ctx);
    void enterAdd_op(SimplangParser.Add_opContext ctx);
    void exitAdd_op(SimplangParser.Add_opContext ctx);
    void enterStatements(SimplangParser.StatementsContext ctx);
    void exitStatements(SimplangParser.StatementsContext ctx);
    void enterBlockStatement(SimplangParser.BlockStatementContext ctx);
    void exitBlockStatement(SimplangParser.BlockStatementContext ctx);
    void enterCallStatement(SimplangParser.CallStatementContext ctx);
    void exitCallStatement(SimplangParser.CallStatementContext ctx);
    void enterMethodName(SimplangParser.MethodNameContext ctx);
    void exitMethodName(SimplangParser.MethodNameContext ctx);
    void enterDeclStatement(SimplangParser.DeclStatementContext ctx);
    void exitDeclStatement(SimplangParser.DeclStatementContext ctx);
    void enterDecl(SimplangParser.DeclContext ctx);
    void exitDecl(SimplangParser.DeclContext ctx);
}

Here's a test class that overrides a few methods in the empty listener and calls the parser.

SimplangTest.java

public class SimplangTest {

    public static void main(String[] args) {

        ANTLRInputStream input = new ANTLRInputStream(
                "var x = 4;\nfoo(x, 10);\nbar(y + 10 - 1, 'x' + 'y' + 'z');");

        SimplangLexer lexer = new SimplangLexer(input);

        SimplangParser parser = new SimplangParser(new CommonTokenStream(lexer));

        parser.addParseListener(new SimplangBaseListener() {
            public void exitArg(SimplangParser.ArgContext ctx) {
                System.out.print(", ");
            }

            public void exitCall(SimplangParser.CallContext call) {
                System.out.print("})");
            }

            public void exitMethodName(SimplangParser.MethodNameContext ctx) {
                System.out.printf("call(\"%s\", new Object[]{", ctx.ID()
                        .getText());
            }

            public void exitCallStatement(SimplangParser.CallStatementContext ctx) {
                System.out.println(";");
            }

            public void enterDecl(SimplangParser.DeclContext ctx) {
                System.out.print("define(");
            }

            public void exitVariableName(SimplangParser.VariableNameContext ctx) {
                System.out.printf("\"%s\", ", ctx.ID().getText());
            }

            public void exitDeclStatement(SimplangParser.DeclStatementContext ctx) {
                System.out.println(");");
            }

            public void exitAdd_op(SimplangParser.Add_opContext ctx) {
                if (ctx.MINUS() != null) {
                    System.out.print(" - ");
                } else {
                    System.out.print(" + ");
                }
            }

            public void exitPrimary_expr(SimplangParser.Primary_exprContext ctx) {
                if (ctx.string != null) {
                    String value = ctx.string.getText();
                    System.out.printf("\"%s\"",
                            value.subSequence(1, value.length() - 1));
                } else if (ctx.altNum == 2){    //cheating and using the alt# for "INT"
                    System.out.printf("read(\"%s\")", ctx.id.getText());
                } else {
                    System.out.print(ctx.INT().getText());
                }
            }
        });

        parser.compilationUnit();
    }
}

Here's the test input hard-coded in the test class:

var x = 4;
foo(x, 10);
bar(y + 10 - 1, 'x' + 'y' + 'z');

Here's the output produced:

define("x", 4);
call("foo", new Object[]{read("x"), 10, });
call("bar", new Object[]{read("y") + 10 - 1, "x" + "y" + "z", });

It's a silly example, but it shows a few of the features that might be useful to you when building a custom AST.

107

answered Oct 06 '22 02:10

user1201210

Related questions
                            
                                How do I use ProGuard?
                            
                                How to open a new eclipse editor with a specific cursor offset position
                            
                                Can Java program establish JDBC Connection via Proxy Server
                            
                                Find weather using Java
                            
                                Can i avoid the cipher reinitialization per encrypt/decrypt call when using random salts per encryption?
                            
                                How do you zoom in on a JavaFX 2 Canvas node?
                            
                                Java : Singleton class instances in a Web based Application
                            
                                could not instantiate RegionFactory
                            
                                Configuring Shiro to allow anonymous access to resource folders (JS, CSS etc)
                            
                                How can I POST using Java and include parameters and a raw request body?
                            
                                How do I clear a JTree model?(Removing all nodes)
                            
                                Is there a proper algorithm for detecting the background color of a figure?
                            
                                Will Java have a way for non-library developers to use extension methods?
                            
                                Mathematical Set Validation with regular-expression
                            
                                What is a thread-safe ByteArrayOutputStream?
                            
                                JFreeChart - change SeriesStroke of chart lines from solid to dashed in one line
                            
                                Android: Using JNI from NativeActivity
                            
                                How is reading an InputStream object from a local file different than from the network (via Amazon S3)?
                            
                                java.lang.IllegalArgumentException: argument type mismatch while using Reflection
                            
                                Handling missing attributes in Naive Bayes classifier

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Mapping ANTLR parse rules to custom Java AST classes for code generation

Tags:

java

parsing

llvm

antlr

abstract-syntax-tree