Hi I am new in Compiler development, and am wondering how AST look like. I have a small section of code, and I use Clang for generating the AST. I don't get much information out of it. From the looks of it, the Syntax tree is exactly the same as the source, except for one struct that is added to almost any sample I test with.
Source:
class A {
public:
int *a, *b, *c;
int i;
void sum() {
a = new int[5];
b = new int[5];
c = new int[5];
for (i = 0; i < 5; i++) {
a[i] = i;
b[i] = i;
}
for (i = 0; i < 5; i++) {
c[i] = a[i] + b[i];
}
delete[] a; delete[] b; delete[] c;
}
};
class B : public A {
};
int main() {
B bclass;
bclass.sum();
return 0;
}
Command to generate AST:
clang++ -cc1 -ast-print ~/sum.cpp
AST output:
struct __va_list_tag {
unsigned int gp_offset;
unsigned int fp_offset;
void *overflow_arg_area;
void *reg_save_area;
};
typedef struct __va_list_tag __va_list_tag;
class A {
public:
int *a;
int *b;
int *c;
int i;
void sum() {
this->a = new int [5];
this->b = new int [5];
this->c = new int [5];
for (this->i = 0; this->i < 5; this->i++) {
this->a[this->i] = this->i;
this->b[this->i] = this->i;
}
for (this->i = 0; this->i < 5; this->i++) {
this->c[this->i] = this->a[this->i] + this->b[this->i];
}
delete [] this->a;
delete [] this->b;
delete [] this->c;
}
};
class B : public A {
};
int main() {
B bclass;
bclass.sum();
return 0;
}
Thanks
In computer science, an abstract syntax tree (AST), or just syntax tree, is a tree representation of the abstract syntactic structure of text (often source code) written in a formal language. Each node of the tree denotes a construct occurring in the text.
1.2 The Clang Abstract Syntax Tree. An Abstract Syntax Tree (AST) is the structural in-memory repre- sentation of a program's source code. Clang's AST mixes syntactic- only (such as parenthesis) and semantic-only (such as implicit con- versions) nodes into the same tree structure.
Abstract Syntax Tree is a kind of tree representation of the abstract syntactic structure of source code written in a programming language. Each node of the tree denotes a construct occurring in the source code.
The Abstract Syntax Tree is generated using both the list of tokens (from the lexical analysis) and the source code. The AST is generated during the syntax analysis stage of the compilation. Any syntax error would be detected and a syntax error message would then be returned, stopping the compilation process.
There is a small confusion between the various options available:
-ast-print
will pretty-print the current AST, that is, it will render the code it understood as closely as possible to what it parsed (but making some things explicit, like the apparition of the this
)-ast-dump
will generate a lisp-like representation of the current ASTThe pretty printer can be useful to check that the AST is lossless (ie, preserved the const
-ness of such expression, etc...) but is not really about development.
If you want to hack on the compiler, you need -ast-dump
, which will generate an output that maps directly the in-memory representation of the code that was parsed.
The AST is a linked structure in memory ("tree" does not make justice to the complexity of the thing, but it's the name people use). What -ast-print
produces is a textual representation of the AST. Since the human who set the option is already familiar with C/C++-like syntax, it is printed in a representation that follows that syntax. This is a design choice, not a happy coincidence.
If you want to see what the AST looks like when it's not printed on purpose in a familiar syntax, you could for instance look at GIMPLE, GCC's internal representation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With