Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Abstract Syntax Tree representation in C++

Tags:

I already have the tokenizer interface which creates a list of tokens. I have the working mechanism for the parser. It's really unique and works like a charm. The only thing that i miss is the basic structure of the AST. How the tree, the nodes and the statements should be represented on abstraction level. I don't need any implementation only a quick idea how should it look like in class hierarchy? I'm working on an object-oriented language. Yeah, i already realized that i will need two types of statements. Some value returning "expression" type statement and a non-returning, instruction flow controlling type statement. Thank you very much.

like image 735
Daniel Adamko Avatar asked Aug 12 '13 13:08

Daniel Adamko


People also ask

How do you syntax an abstract tree?

Abstract Syntax Tree is a kind of tree representation of the abstract syntactic structure of source code written in a programming language. Each node of the tree denotes a construct occurring in the source code.

What is meant by abstract syntax tree?

In computer science, an abstract syntax tree (AST), or just syntax tree, is a tree representation of the abstract syntactic structure of text (often source code) written in a formal language. Each node of the tree denotes a construct occurring in the text.

Why is an abstract syntax tree used?

An AST is usually the result of the syntax analysis phase of a compiler. It often serves as an intermediate representation of the program through several stages that the compiler requires, and has a strong impact on the final output of the compiler. ASTs are also used for uses cases like static code analysis.

What is the difference between syntax tree and abstract syntax tree?

An Abstract Syntax Tree describes the parse tree logically. It does not need to contain all the syntactical constructs required to parse some source code (white spaces, braces, keywords, parenthesis etc). That's why Parse Tree is also called Concrete Syntax Tree while the AST is called Syntax Tree .


1 Answers

If your language is imperative/c-like, the common scenario begins with top-hierarchy being split in 2 supertypes:

  • Expression
  • Statement

The program is a list of statement, which is a statement itself.

You will probably want to have one class for type of statement that extends the statement base class.

A typical scenario looks like:

  • statement block ( a list of statements )
  • ite (if then else)
  • for (a for loop with its initialization statements list, check expression, increment statements, and block
  • while (similar, but only check expression
  • variable declaration
  • assignment (including += -= ++ --, you can wrap all in one class with an operator field, a lval and an rval)
  • Function call (void one)

For the expressions:

  • Bop (binary operation, anything that has 2 operands and 1 operator i.e. + - * / % | & && || == <
  • Uop (unary operation, anything that has 1 operand and 1 operator i.e. ~ !)
  • Function call (not-void ones)
  • Conditional expression ( exp ? true val : false val )

The good thing of having this 2 abstractions (expressions and statements) is that inside all your classes, you will have abstract types, and will be able to visit the AST with the visitor pattern, for example.

For example, some classes would look like this (pseudocode):

class Ite extends Statement {    Expression condition;    Statement ifBranch;    Statement elseBranch; }   class Bop extends Expression {    BOperator operator;  // +, -. * or whatever    Expression left;     // Left operand    Expression right;    // Right operand }   class StatementBlock extends Statement {    List<Statement> statements; }   class Assignment extends Statement {    AOperator assignOp;  // = += -= etc.    LVal lvalue;         // The lvalue cannot be an arbitrary expression, you will usually have a specific type for it    Expression rvalue;   // Right value } 

Also, you will need some way to represent types (for the AST, just static types are enough, if you project to implement some back-end as well, you will need some dynamic types too).

Static types can usually be specified with some enumerations, if you don't plan to support fixed-size arrays which need a size information. If you want fixed-size arrays with, size, you can implement one class for type and have the array type hold additional size information.

enum Type {    CHAR,    SHORT,    INT,    LONG,    FLOAT,    DOUBLE,    ARRAY }  class Float extends StaticType {     final Type type = Type.FLOAT; }  class Array extends StaticArray {     final Type type = Type.ARRAY;      int size; } 

You will then instantiate one StaticType instance for every type in the AST, for example when the user declares a variable. You will be able to use the same hierarchy if you plan to do static type-checking in the future, also.

As for running/interpreting the code in the AST form, you will need a Memory which will hold a stack/heap containing information about runtime memory. At that point you will need to store values together with their type information.

like image 163
Lake Avatar answered Oct 14 '22 19:10

Lake