Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing Java Source Code

I am asked to develop a software which should be able to create Flow chart/ Control Flow of the input Java source code. So I started researching on it and arrived at following solutions:

To create flow chart/control flow I have to recognize controlling statements and function calls made in the given source code Now I have two ways of recognizing:

  1. Parse the Source code by writing my own grammars (A complex solution I think). I am thinking to use Antlr for this.
  2. Read input source code files as text and search for the specific patterns (May become inefficient)

Am I right here? Or I am missing something very fundamental and simple? Which approach would take less time and do the work efficiently? Any other suggestions in this regard will be welcome too. Any other efficient approach would help because the input source code may span multiple files and can be fairly complex.

I am good in .NET languages but this is my first big project in Java. I have basic knowledge of Compiler Design so writing grammars should not be impossible for me.

Sorry If I am being unclear. Please ask for any clarifications.

like image 644
Sudh Avatar asked Mar 31 '11 08:03

Sudh


People also ask

What is the parsing in Java?

A parser is a Java class that extracts attributes from a local file and stores the information in the repository. More specifically, in the case of a document, a parser: Takes in an InputStream or Reader object. Processes the character input, extracting attributes as it goes.

How do you write parser in Java?

There are three ways of parsing in Java: Using an existing library. Using a tool or library to build a parser. By building a custom parser from scratch.

How do you parse a string in Java?

To parse a string in Java, you can use the Java String split() method, Java Scanner class, or StringUtils class. For parsing a string based on the specified condition, these methods use delimiters to split the string.


3 Answers

I'd go with Antlr and use an existing Java grammar: https://github.com/antlr/grammars-v4

like image 198
Peter Knego Avatar answered Nov 11 '22 06:11

Peter Knego


All tools handling Java code usually decide first whether they want to process the language Java or Java byte code files. That is a strategic decision and depends on your use case. I could image both for flow chart generation. When you have decided that question. There are already several frameworks or libraries, which could help you on that. For byte code engineering there are: ASM, JavaAssist, Soot, and BCEL, which seems to be dead. For Java language parsing and analyzing, there are: Polyglot, the eclipse compiler, and javac. All of these include a complete compiler frontend for Java and are open source.

I would try to avoid writing my own parser for Java. I did that once. Java has a rather complex grammar, but which can be found elsewhere. The real work begins with name and type resolution. And you would need both, if you want to generate graphs which cover more than one method body.

like image 37
jmg Avatar answered Nov 11 '22 07:11

jmg


Eclipse has a library for parsing the source code and creating Abstract Syntax Tree from it which would let you extract what you want.

See here for a tutorial http://www.vogella.de/articles/EclipseJDT/article.html

See here for api http://help.eclipse.org/indigo/topic/org.eclipse.jdt.doc.isv/reference/api/org/eclipse/jdt/core/dom/package-summary.html#package_description

like image 40
user1309411 Avatar answered Nov 11 '22 08:11

user1309411