Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using ANTLR for static analysis of Java source file

Tags:

java

antlr

Does anyone have a complete implementation (possibly github or googlecode) for using an ANTLR grammar file and Java source code to analyze Java source. For example, I want to simply be able to count the number of variables, method, etc.

Also using a recent version of ANTLR.

like image 459
Berlin Brown Avatar asked May 09 '12 20:05

Berlin Brown


1 Answers

I thought I'd take a crack at this over my lunch break. This may not completely solve your problem, but it might give you a place to start. The example assumes you're doing everything in the same directory.

  1. Download the ANTLR source from GitHub. The pre-compiled "complete" JAR from the ANTLR site contains a known bug. The GitHub repo has the fix.

  2. Extract the ANTLR tarball.

    % tar xzf antlr-antlr3-release-3.4-150-g8312471.tar.gz
  3. Build the ANTLR "complete" JAR.

    % cd antlr-antlr3-8312471
    % mvn -N install
    % mvn -Dmaven.test.skip=true
    % mvn -Dmaven.test.skip=true package assembly:assembly
    % cd -
  4. Download a Java grammar. There are others, but I know this one works.

  5. Compile the grammar to Java source.

    % mkdir com/habelitz/jsobjectizer/unmarshaller/antlrbridge/generated
    % mv *.g com/habelitz/jsobjectizer/unmarshaller/antlrbridge/generated
    % java -classpath antlr-antlr3-8312471/target/antlr-master-3.4.1-SNAPSHOT-completejar.jar org.antlr.Tool -o com/habelitz/jsobjectizer/unmarshaller/antlrbridge/generated Java.g
  6. Compile the Java source.

    % javac -classpath antlr-antlr3-8312471/target/antlr-master-3.4.1-SNAPSHOT-completejar.jar com/habelitz/jsobjectizer/unmarshaller/antlrbridge/generated/*.java
  7. Add the following source file, Main.java.

    import java.io.IOException;
    import java.util.List;
    import org.antlr.runtime.*; import org.antlr.runtime.tree.*;
    import com.habelitz.jsobjectizer.unmarshaller.antlrbridge.generated.*;
    public class Main { public static void main(String... args) throws NoSuchFieldException, IllegalAccessException, IOException, RecognitionException { JavaLexer lexer = new JavaLexer(new ANTLRFileStream(args[1], "UTF-8")); JavaParser parser = new JavaParser(new CommonTokenStream(lexer)); CommonTree tree = (CommonTree)(parser.javaSource().getTree()); int type = ((Integer)(JavaParser.class.getDeclaredField(args[0]).get(null))).intValue(); System.out.println(count(tree, type)); } private static int count(CommonTree tree, int type) { int count = 0; List children = tree.getChildren(); if (children != null) { for (Object child : children) { count += count((CommonTree)(child), type); } } return ((tree.getType() != type) ? count : count + 1); } }
  8. Compile.

    % javac -classpath .:antlr-antlr3-8312471/target/antlr-master-3.4.1-SNAPSHOT-completejar.jar Main.java
  9. Select a type of Java source that you want to count; for example, VAR_DECLARATOR, FUNCTION_METHOD_DECL, or VOID_METHOD_DECL.

    % cat com/habelitz/jsobjectizer/unmarshaller/antlrbridge/generated/Java.tokens
  10. Run on any file, including the recently created Main.java.

    % java -classpath .:antlr-antlr3-8312471/target/antlr-master-3.4.1-SNAPSHOT-completejar.jar Main VAR_DECLARATOR Main.java
    6

This is imperfect, of course. If you look closely, you may have noticed that the local variable of the enhanced for statement wasn't counted. For that, you'd need to use the type FOR_EACH, rather than VAR_DECLARATOR.

You'll need a good understanding of the elements of Java source, and be able to take reasonable guesses at how those match to the definitions of this particular grammar. You also won't be able to do counts of references. Declarations are easy, but counting uses of a field, for example, requires reference resolution. Does p.C.f refer to a static field f of a class C inside a package p, or does it refer to an instance field f of the object stored by a static field C of a class p? Basic parsers don't resolve references for languages as complex as Java, because the general case can be very difficult. If you want this level of control, you'll need to use a compiler (or something closer to it). The Eclipse compiler is a popular choice.

I should also mention that you have other options besides ANTLR. JavaCC is another parser generator. The static analysis tool PMD, which uses JavaCC as its parser generator, allows you to write custom rules that could be used for the kinds of counts you indicated.

like image 73
Nathan Ryan Avatar answered Oct 06 '22 19:10

Nathan Ryan