Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a simple example of using antlr4 to create an AST from java source code and extract methods, variables and comments? [closed]

Tags:

java

antlr

Can someone provide a detailed example as to how I can do this using antlr4? Instructions right from installing antlr4 and its dependencies would be highly appreciated.

like image 755
user3266901 Avatar asked Feb 03 '14 18:02

user3266901


People also ask

What is antlr4 used for?

ANTLR 4 allows you to define lexer and parser rules in a single combined grammar file. This makes it really easy to get started. To get familiar with working with ANTLR, let's take a look at what a simple JSON grammar would look like and break it down. grammar Json; @header { package com.

Can ANTLR generate AST?

ANTLR uses a factory pattern to create and connect AST nodes. This is done to primarily to separate out the tree construction facility from the parser, but also gives you a hook in between the parser and the tree node construction. Subclass ASTFactory to alter the create methods. method on the parser or factory.

Why do we use ANTLR?

ANTLR is a powerful parser generator that you can use to read, process, execute, or translate structured text or binary files. It's widely used in academia and industry to build all sorts of languages, tools, and frameworks. Twitter search uses ANTLR for query parsing, with over 2 billion queries a day.

Is ANTLR open source?

ANTLR 3 and ANTLR 4 are free software, published under a three-clause BSD License. Prior versions were released as public domain software. Documentation, derived from Parr's book The Definitive ANTLR 4 Reference, is included with the BSD-licensed ANTLR 4 source.


2 Answers

Here it is.

First, you're gonna buy the ANTLR4 book ;-)

Second, you'll download antlr4 jar and the java grammar (http://pragprog.com/book/tpantlr2/the-definitive-antlr-4-reference)

Then, you can change the grammar a little bit, adding these to the header

    (...)
grammar Java;

options 
{
    language = Java;
}

// starting point for parsing a java file
compilationUnit
    (...)

I'll change a little thing in the grammar just to illustrate something.

/*
methodDeclaration
    :   (type|'void') Identifier formalParameters ('[' ']')*
        ('throws' qualifiedNameList)?
        (   methodBody
        |   ';'
        )
    ;
*/
methodDeclaration
    :   (type|'void') myMethodName formalParameters ('[' ']')*
        ('throws' qualifiedNameList)?
        (   methodBody
        |   ';'
        )
    ;

myMethodName
    :   Identifier
    ;

You see, the original grammar does not let you identify the method identifier from any other identifier, so I've commented the original block and added a new one just to show you how to get what you want.

You'll have to do the same for other elements you want to retrieve, like the comments, that are currently being just skipped. That's for you :-)

Now, create a class like this to generate all the stubs

package mypackage;

public class Gen {

    public static void main(String[] args) {
        String[] arg0 = { "-visitor", "/home/leoks/EclipseIndigo/workspace2/SO/src/mypackage/Java.g4", "-package", "mypackage" };
        org.antlr.v4.Tool.main(arg0);
    }

}

Run Gen, and you'll get some java code created for you in mypackage.

Now create a Visitor. Actually, the visitor will parse itself in this example

package mypackage;

import java.io.FileInputStream;
import java.io.IOException;

import mypackage.JavaParser.MyMethodNameContext;

import org.antlr.v4.runtime.ANTLRInputStream;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.tree.ParseTree;
import org.antlr.v4.runtime.tree.ParseTreeWalker;

/**
 * @author Leonardo Kenji Feb 4, 2014
 */
public class MyVisitor extends JavaBaseVisitor<Void> {

    /**
     * Main Method
     * 
     * @param args
     * @throws IOException
     */
    public static void main(String[] args) throws IOException {
        ANTLRInputStream input = new ANTLRInputStream(new FileInputStream("/home/leoks/EclipseIndigo/workspace2/SO/src/mypackage/MyVisitor.java")); // we'll
                                                                                                                                                    // parse
                                                                                                                                                    // this
                                                                                                                                                    // file
        JavaLexer lexer = new JavaLexer(input);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        JavaParser parser = new JavaParser(tokens);
        ParseTree tree = parser.compilationUnit(); // see the grammar ->
                                                    // starting point for
                                                    // parsing a java file



        MyVisitor visitor = new MyVisitor(); // extends JavaBaseVisitor<Void>
                                                // and overrides the methods
                                                // you're interested
        visitor.visit(tree);
    }

    /**
     * some attribute comment
     */
    private String  someAttribute;

    @Override
    public Void visitMyMethodName(MyMethodNameContext ctx) {
        System.out.println("Method name:" + ctx.getText());
        return super.visitMyMethodName(ctx);
    }

}

and that's it.

You'll get something like

Method name:main
Method name:visitMyMethodName

ps. one more thing. While I was writing this code in eclipse, I've got a strange exception. This is caused by Java 7 and can be fixed just adding these parameters to your compiler (thanks to this link http://java.dzone.com/articles/javalangverifyerror-expecting)

enter image description here

like image 101
Leo Avatar answered Sep 27 '22 18:09

Leo


grammar Criteria;

@parser::header {
  import java.util.regex.Pattern;
}

options
{
  superClass = ReferenceResolvingParser;
}

@parser::members {

  public CriteriaParser(TokenStream input, Object object) {
    this(input);
    setObject(object);
  }

}

/* Grammar rules */

reference returns [String value]
          : '$.' IDENTIFIER { $value = resolveReferenceValue($IDENTIFIER.text); }
          ;

operand returns [String value]
        : TRUE { $value = $TRUE.text; }
        | FALSE { $value = $FALSE.text; }
        | DECIMAL { $value = $DECIMAL.text; }
        | QUOTED_LITERAL  { $value = $QUOTED_LITERAL.text.substring(1, $QUOTED_LITERAL.text.length() - 1); }
        | reference { $value = $reference.value; }
        ;

operand_list returns [List value]
             @init{ $value = new ArrayList(); }
             : LBPAREN o=operand { $value.add($o.value); } (',' o=operand { $value.add($o.value); })* RBPAREN
             ;

comparison_expression returns [boolean value]
                      : lhs=operand NEQ rhs=operand { $value = !$lhs.value.equals($rhs.value); }
                      | lhs=operand EQ rhs=operand { $value = $lhs.value.equals($rhs.value); }
                      | lhs=operand GT rhs=operand { $value = $lhs.value.compareTo($rhs.value) > 0; }
                      | lhs=operand GE rhs=operand { $value = $lhs.value.compareTo($rhs.value) >= 0; }
                      | lhs=operand LT rhs=operand { $value = $lhs.value.compareTo($rhs.value) < 0; }
                      | lhs=operand LE rhs=operand { $value = $lhs.value.compareTo($rhs.value) <= 0; }
                      ;

in_expression returns [boolean value]
              : lhs=operand IN rhs=operand_list { $value = $rhs.value.contains($lhs.value); };

rlike_expression returns [boolean value]
                 : lhs=operand RLIKE rhs=QUOTED_LITERAL { $value = Pattern.compile($rhs.text.substring(1, $rhs.text.length() - 1)).matcher($lhs.value).matches(); }
                 ;

logical_expression returns [boolean value]
                   : c=comparison_expression { $value = $c.value; }
                   | i=in_expression { $value = $i.value; }
                   | l=rlike_expression { $value = $l.value; }
                   ;

chained_expression returns [boolean value]
                   : e=logical_expression { $value = $e.value; } (OR  c=chained_expression { $value |= $c.value; })?
                   | e=logical_expression { $value = $e.value; } (AND c=chained_expression { $value &= $c.value; })?
                   ;

grouped_expression returns [boolean value]
                   : LCPAREN c=chained_expression { $value = $c.value; } RCPAREN ;

expression returns [boolean value]
           : c=chained_expression { $value = $c.value; } (OR  e=expression { $value |= $e.value; })?
           | c=chained_expression { $value = $c.value; } (AND e=expression { $value &= $e.value; })?
           | g=grouped_expression { $value = $g.value; } (OR  e=expression { $value |= $e.value; })?
           | g=grouped_expression { $value = $g.value; } (AND e=expression { $value &= $e.value; })?
           ;

criteria returns [boolean value]
         : e=expression { $value = $e.value; }
         ;


/* Lexical rules */

AND : 'and' ;
OR  : 'or' ;

TRUE  : 'true' ;
FALSE : 'false' ;

EQ    : '=' ;
NEQ   : '<>' ;
GT    : '>' ;
GE    : '>=' ;
LT    : '<' ;
LE    : '<=' ;
IN    : 'in' ;
RLIKE : 'rlike' ;

LCPAREN : '(' ;
RCPAREN : ')' ;
LBPAREN : '[' ;
RBPAREN : ']' ;

DECIMAL : '-'?[0-9]+('.'[0-9]+)? ;

IDENTIFIER : [a-zA-Z_][a-zA-Z_.0-9]* ;

QUOTED_LITERAL :
                 (  '\''
                    ( ('\\' '\\') | ('\'' '\'') | ('\\' '\'') | ~('\'') )*
                 '\''  )
                ;

WS : [ \r\t\u000C\n]+ -> skip ;



public class CriteriaEvaluator extends CriteriaBaseListener
{

    static class CriteriaEvaluatorErrorListener extends BaseErrorListener
    {

        Optional<String> error = Optional.empty();

        @Override
        public void syntaxError(Recognizer<?, ?> recognizer, Object offendingSymbol, int line, int charPositionInLine, String msg, RecognitionException e) {
            error = Optional.of(String.format("Failed to parse at line %d:%d due to %s", line, charPositionInLine + 1, msg));
        }

    }

    public static boolean evaluate(String input, Object argument)
    {
        CriteriaLexer lexer = new CriteriaLexer(new ANTLRInputStream(input));
        CriteriaParser parser = new CriteriaParser(new CommonTokenStream(lexer), argument);
        parser.removeErrorListeners();
        CriteriaEvaluatorErrorListener errorListener = new CriteriaEvaluatorErrorListener();
        lexer.removeErrorListeners();
        lexer.addErrorListener(errorListener);
        parser.removeErrorListeners();
        parser.addErrorListener(errorListener);
        CriteriaParser.CriteriaContext criteriaCtx = parser.criteria();
        if(errorListener.error.isPresent())
        {
            throw new IllegalArgumentException(errorListener.error.get());
        }
        else
        {
            return criteriaCtx.value;
        }
    }

}
like image 26
sample Avatar answered Sep 27 '22 19:09

sample