I am searching for a way to "dump" abstract syntax trees into files so that code can be parsed with a compiler and then stored in a language- and compiler independent way. Yet I was unable to find any widely recognized way for doing this. Does such a way exist?
Once we have an Abstract Syntax Tree we can both manipulate it as well as "print" it into a different type of code. Using ASTs to manipulate code is safer than doing those operations directly on the code as text or on a list of tokens.
ASTs are needed because of the inherent nature of programming languages and their documentation. Languages are often ambiguous by nature. In order to avoid this ambiguity, programming languages are often specified as a context-free grammar (CFG).
An Abstract Syntax Tree describes the parse tree logically. It does not need to contain all the syntactical constructs required to parse some source code (white spaces, braces, keywords, parenthesis etc). That's why Parse Tree is also called Concrete Syntax Tree while the AST is called Syntax Tree .
In the first step of lexical analysis, the code will be broken down into smaller pieces called tokens. In the next step of syntax analysis, the tokens are converted into a tree called the abstract syntax tree. The structure of the tree is similar to the code structure.
There are no standards for storing ASTs, or more importantly from your point of view, sharing them among tools. The reason is that ASTs are dependent on grammars (which vary; C has "many" depending on which specific compiler and version) and parsing technology.
There have been lots of attempts to define universal AST forms across multiple languages but none of them have really worked; the semantics of the operators varies too much. (Consider just "+": what does it really mean? In Fortran, you can add arrays, in Java, you can "add" strings).
However, one can write out specific ASTs rather easily. A simple means is to use some kind of notation in which a node is identified along with its recursive children using some kind of nested "parentheses".
Lisp S-expressions are a common way to do this. You can see an example of the S-expression style generated by our tools.
People have used XML for this, too, but it is pretty bulky. You can see an XML output example here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With