Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there any standard way to store abstract syntax trees files?

I am searching for a way to "dump" abstract syntax trees into files so that code can be parsed with a compiler and then stored in a language- and compiler independent way. Yet I was unable to find any widely recognized way for doing this. Does such a way exist?

like image 804
drakide Avatar asked Nov 21 '12 07:11

drakide


People also ask

What do you do with an Abstract Syntax Tree?

Once we have an Abstract Syntax Tree we can both manipulate it as well as "print" it into a different type of code. Using ASTs to manipulate code is safer than doing those operations directly on the code as text or on a list of tokens.

Do you need an Abstract Syntax Tree?

ASTs are needed because of the inherent nature of programming languages and their documentation. Languages are often ambiguous by nature. In order to avoid this ambiguity, programming languages are often specified as a context-free grammar (CFG).

Is Abstract Syntax Tree and syntax tree same?

An Abstract Syntax Tree describes the parse tree logically. It does not need to contain all the syntactical constructs required to parse some source code (white spaces, braces, keywords, parenthesis etc). That's why Parse Tree is also called Concrete Syntax Tree while the AST is called Syntax Tree .

How Abstract Syntax Tree is generated?

In the first step of lexical analysis, the code will be broken down into smaller pieces called tokens. In the next step of syntax analysis, the tokens are converted into a tree called the abstract syntax tree. The structure of the tree is similar to the code structure.


1 Answers

There are no standards for storing ASTs, or more importantly from your point of view, sharing them among tools. The reason is that ASTs are dependent on grammars (which vary; C has "many" depending on which specific compiler and version) and parsing technology.

There have been lots of attempts to define universal AST forms across multiple languages but none of them have really worked; the semantics of the operators varies too much. (Consider just "+": what does it really mean? In Fortran, you can add arrays, in Java, you can "add" strings).

However, one can write out specific ASTs rather easily. A simple means is to use some kind of notation in which a node is identified along with its recursive children using some kind of nested "parentheses".

Lisp S-expressions are a common way to do this. You can see an example of the S-expression style generated by our tools.

People have used XML for this, too, but it is pretty bulky. You can see an XML output example here.

like image 165
Ira Baxter Avatar answered Oct 08 '22 12:10

Ira Baxter