Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Developing Abstract Syntax Tree

I've scoured the internet looking for some newbie information on developing a C# Abstract Syntax Trees but I can only find information for people already 'in-the-know'. I am a line-of-business application developer so topics like these are a bit over my head, but this is for my own education so I'm willing to spend the time and learn whatever concepts are necessary.

Generally, I'd like to learn about the techniques behind developing an abstract representation of code from a code string. More specifically, I'd like to be able to use this AST to do C# syntax highlighting. (I realize that syntax highlighting doesn't necessary need an AST, but this seems like a good opportunity to learn some "compiler"-level techniques.)

I apologize if this question is a bit broad, but I'm not sure how else to ask.

Thanks!

like image 869
Vince Fedorchak Avatar asked May 21 '12 00:05

Vince Fedorchak


People also ask

How do you create an abstract syntax tree?

The Abstract Syntax Tree is generated using both the list of tokens (from the lexical analysis) and the source code. The AST is generated during the syntax analysis stage of the compilation. Any syntax error would be detected and a syntax error message would then be returned, stopping the compilation process.

How do you create an abstract syntax tree in Python?

The abstract syntax itself might change with each Python release; this module helps to find out programmatically what the current grammar looks like. An abstract syntax tree can be generated by passing ast. PyCF_ONLY_AST as a flag to the compile() built-in function, or using the parse() helper provided in this module.

Why abstract syntax tree is important?

Once we have an Abstract Syntax Tree we can both manipulate it as well as "print" it into a different type of code. Using ASTs to manipulate code is safer than doing those operations directly on the code as text or on a list of tokens. Manipulating text is always dangerous; it shows the least amount of context.

What is an abstract syntax tree in compiler design?

An Abstract Syntax Tree, or AST, is a tree representation of the source code of a computer program that conveys the structure of the source code. Each node in the tree represents a construct occurring in the source code.


2 Answers

First you need to understand what parsing is, and what abstract syntax trees are. For this, you can consult Wikipedia on abstract syntax trees for a first look.

You really need to spend some time with a compiler text book to understand how abstract syntax trees are related to parsing, and can be constructed while parsing; the classic reference is Aho/Ullman/Sethi's "Compilers" book (easily found on the web). You may find the SO answer to Are there any "fun" ways to learn about Languages, Grammars, Parsing and Compilers? instructive.

Once you understand how to build an AST for a simple grammar, you can then turn your attention to something like C#. The issue here is sheer scale; it is one thing to play with a toy language with 20 grammar rules. It is another to work with grammar of several hundred or a thousand rules. Experience will small ones will make it a lot easier to understand how the big ones are put together, and how to live with them.

You probably don't want to build your own C# grammar (or implement the one from the C# standard); its quite a lot of work. You can get available tools that will hand you C# ASTs (Roslyn has already been mentioned; ANTLR has a C# parser, there are many more).

It is true that you might use an AST for syntax highlighting (although that is probably killing a gnat with a sledgehammer). What most people don't think much about (but the compiler books emphasize), is what happens after you have an AST; mostly they aren't useful by themselves. You actually need a lot more machinery to do anything interesting. Rather than repeat this over and over (I keep seeing the same kind of questions), you can see my discussion on Life After Parsing for more details.

like image 58
Ira Baxter Avatar answered Oct 22 '22 07:10

Ira Baxter


You should probably take a look at this talk by Phil Trelford:

Write your own compiler in 24 hours

This man is a genius, and will leave you fired up to learn about compilers. He explains it literally easily enough for a five year old to understand. The five year old in question is his son, so probably has an unfair advantage, but five is five.

like image 32
Jonathan Avatar answered Oct 22 '22 05:10

Jonathan