Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert source code to a xml based representation of the ast?

i wanna get a xml representation of the ast of java and c code. 3 months ago, i asked this question yet but the solutions weren't comfortable for me

  • srcml seems to be a good solution for this problem but it does not support line numbers and columns but i need that feature.
  • about elsa: cite: "There is ongoing effort to export the Elsa AST as an XML document; we expect to be able to advertise this in the next public release."
  • dms... didn't understand that.
  • especially for java, there is javaml which supports line numbers. but the sourceforge page doesn't list any files.

question: there's software available which supports conversion of ast into xml which supports line numbers (and columns) [especially for java and c/c++]? is there an alternative to javaml and srcml?

ps: i don't wanne have parser generators. i hope to find a tool which can be used on the console typing: ./my-xml-generator Test.java [or something like that]... or a java implementation would be great too.

like image 382
autobiographer Avatar asked May 12 '10 10:05

autobiographer


2 Answers

a bit late but here is one: http://xmltranslator.appspot.com/sourcecodetoxml.html

I have implemented it myself and it converts PHP and Java to XML. It's free so enjoy!

Oana.

like image 65
Veni_Vidi_Vici Avatar answered Nov 08 '22 11:11

Veni_Vidi_Vici


What didn't you understand about DMS?

It exists.

It has compiler accurate parsers/frontends for C, C++, Java, C#, COBOL (and many other languages).

It automatically builds full Abstract Syntax Trees for whatever it parses. Each AST node is stamped with file/line/column for the token that represents that start of that node, and the final column can be computed by a DMS API call.

It has a built-in option to generate XML from the ASTs, complete with node type, source position (as above), and any associated literal value. The command line call is:

 run DMSDomainParser ++XML  <path_to_your_file>

You can see what such an XML result looks like for Java.

You probably don't really want what you are wishing for. A 1000 C program may have 100K lines of #include file stuff. A line produces between 5-10 nodes. The DMS XML output is succint and each node only takes a line, so you are looking at ~~ 1 million lines of XML, of 60 characters each --> 60 million characters. That's a big file, and you probably don't want to process it with an XML-based tool.

DMS itself provides a vast amount of infrastructure for manipulating the ASTs it builds: traversing, pattern matching (against patterns coded essentially in source form), source-to-source transforms, control flow, data flow, points-to analysis, global call graphs. You'll find it amazingly hard to replicate all this machinery, and you're likely to need it to do anything interesting.

Moral: much better to use something like DMS to manipulate the AST directly, than to fight with XML.

Full disclosure: I'm the architect behind DMS.

like image 2
Ira Baxter Avatar answered Nov 08 '22 12:11

Ira Baxter