Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate source code from AST with Antlr4 and StringTemplates

If I have an AST and modify it, can I use StringTemplates to generate the source code for the modified AST?

I have successfully implemented my grammar for Antlr4. It generates the AST of a source code and I use the Visitor Class to perform the desired actions. I then modify something in the AST and I would like to generate the source code for that modified AST. (I believe it is called pretty-printing?).

Does Antlr's built in StringTemplates have all the functionality to do this? Where should one start (practical advice is very welcome)?

like image 630
50k4 Avatar asked Jul 07 '16 09:07

50k4


People also ask

Can ANTLR generate AST?

You can just parse a string by passing it to the parser, and it will automatically generate an AST from it which can then be used in your application.

Is ANTLR open source?

ANTLR 3 and ANTLR 4 are free software, published under a three-clause BSD License. Prior versions were released as public domain software. Documentation, derived from Parr's book The Definitive ANTLR 4 Reference, is included with the BSD-licensed ANTLR 4 source.

Why use Antlr4?

ANTLR 4 allows you to define lexer and parser rules in a single combined grammar file. This makes it really easy to get started. To get familiar with working with ANTLR, let's take a look at what a simple JSON grammar would look like and break it down.

How Antlr4 works?

The Antlr4 application generates a lexer from a set of lexer rules, as we'll see in our expression grammar momentarily. We specify rules; Antlr4 generates a lexer from them. Rules resemble assignment statements where names begin with uppercase alpha.


1 Answers

You can walk the tree and use string templates (or even plain out string prints) to spit out text equivalents that to some extent reproduce the source text.

But you will find reproducing the source text in a realistic way harder to do than this suggests. If you want back code that the original programmer will not reject, you need to:

  • Preserve comments. I don't think ANTLR ASTs do this.
  • Generate layout that preserves the original indentation.
  • Preserve the radix, leading-zero count, and other "format" properties of literal values
  • Renerate strings with reasonable escapes

Doing all of this well is tricky. See my SO answer How to compile an AST back to source code for more details. (Weirdly, the ANTLR guy suggests not using an AST at all; I'm guessing this is because string templates only work on ANTLR parse trees whose structure ANTLR understands, vs. ASTs which are whatever you home-rolled.)

If you get all of this right, what you are likely to discover is that modifying the parse tree/AST is harder than it looks. For almost any interesting task on complex languages, you need information which is not trivial to extract from the tree (e.g., what is the meaning of this identifier?, where is this variable used?,...) I call this the problem of Life After Parsing. My main point is that it takes a lot of machinery to modify ASTs and regenerate code; be aware of the size of your project.

like image 138
Ira Baxter Avatar answered Oct 07 '22 16:10

Ira Baxter