Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What would an AST (abstract syntax tree) for an object-oriented programming language look like?

Tags:

I'm reading about AST (abstract syntax trees) but all the samples I see use expressions such as:

a + b * c  

Which could be represented in a lispy like syntax as:

(+ a (* b c) ) 

Which will be the equivalent to:

  +  / \ a   *     / \   b   c 

My question is How an AST for a class in a OOPL would look like?

My naive attempt is for this Java code:

 class Person {       String name;      int    age;      public String toString() {          return "name";      }  } 

Is:

;Hand written (classDeclaration Person       (varDeclaration String name)      (varDeclaration int    age )      (funcDeclaration String toString             (return "name")      )  ) 

But I'm not quite sure how close or far am I to a real AST representation.

Does it depends on the language I choose. How much detail is needed? Are those "xyzDeclaraction" needed or could be as:

 (Person (String name) (int age)) 

Where can I see a "real" representation of an actual programming language to learn more.

like image 518
OscarRyz Avatar asked Jun 16 '11 18:06

OscarRyz


People also ask

What is an AST in programming?

An Abstract Syntax Tree, or AST, is a tree representation of the source code of a computer program that conveys the structure of the source code. Each node in the tree represents a construct occurring in the source code.

What is a syntax tree give an example of a syntax tree?

The syntax tree is shortened form of the Parse Tree. Example1 − Draw Syntax Tree for the string a + b ∗ c − d. Each node in a syntax tree can be executed as data with multiple fields. In the node for an operator, one field recognizes the operator and the remaining field includes a pointer to the nodes for the operands.

How would you define an Abstract Syntax Tree?

An abstract syntax tree (AST) is a way of representing the syntax of a programming language as a hierarchical tree-like structure. This structure is used for generating symbol tables for compilers and later code generation. The tree represents all of the constructs in the language and their subsequent rules.


2 Answers

AST is an abstraction of the CST (concrete syntax tree, or, parse tree). The concrete syntax tree is the tree resulting from the productions (in the grammar) used to parse the file. So your AST is basically derived from your grammar definition, but has for transformed

                        Exp                                           /  |  \                                         /   |   \                       *                  Ident BinOp Ident       into       / \                   /      |     \                  "x" "y"                  /       |      \                "x"       *      "y" 

All in all I think the example in your post looks fine. I would probably wrap the variable declarations in a varDeclList and the function declaration in a methDeclList, and the return statement in a stmtList. (See below.)

One more or less "real" representation of an AST is described by Apple in his book "Modern Compiler Implementation in Java". (Resources can be found here.)

Using those classes, your program would be represented as follows:

Program     ClassDeclList         ClassDecl             Identifier                 id: Person             VarDeclList                 VarDecl                     type: String                     id: name                 VarDecl                     type: int                     id: age             MethDeclList                 MethodDecl                     modifiers: public                     returnType: String                     id: toString                     Formals                         (empty)                     StmtList                         returnStmt                             Identifier                                 id: name 
like image 103
aioobe Avatar answered Nov 08 '22 08:11

aioobe


OP: Where can I see a real representation of an actual programming language to learn more?

For your source text as a file Person.java:

class Person {       String name;     int    age;     public String toString()       { return "name";     }  } 

what follows are both Concrete and Abstract Syntax Tree in an S-expression-style dump of the parser tree from our DMS Software Reengineering Toolkit, using its Java1.6 parser. All the apparant complexity is pretty much caused by the real complexity of the language (e.g., of Java itself).

The CST clearly contains more stuff (139 nodes) than the AST (54 nodes). The AST drops everything that can be automatically inferred from the grammar, given the AST. This includes removing non-value-carrying leaves, unary productions, and compressing spines caused by left or right recursive grammar rules into explicit list nodes.

A left paren signals a new subtree. Following the left paren is the name of the node type; @Java~Java1_.6 might seem unnecessary until you understand DMS can handle many languages at once, including langauges nested inside one another. The #nnnnnn is the memory address of the node. ^M means "this node has M parents and is left off when M==1. Things inside [...] are the node value. A { M } means this list node has M list-children. Each node is stamped with position information.

This is the Concrete Syntax tree (see further down for AST):

(compilation_unit@Java~Java1_6=1#4885d00^0 Line 1 Column 1 File C:/temp/Person.java  (type_declarations@Java~Java1_6=15#4885cc0 Line 1 Column 1 File C:/temp/Person.java   (type_declarations@Java~Java1_6=16#4884d80 Line 1 Column 1 File C:/temp/Person.java)type_declarations   (type_declaration@Java~Java1_6=17#4885ca0 Line 1 Column 1 File C:/temp/Person.java    (type_class_modifiers@Java~Java1_6=77#4884dc0 Line 1 Column 1 File C:/temp/Person.java)type_class_modifiers    (class_header@Java~Java1_6=89#4884ec0 Line 1 Column 1 File C:/temp/Person.java    |('class'@Java~Java1_6=459#4884c60[Keyword:0] Line 1 Column 1 File C:/temp/Person.java)'class'    |(IDENTIFIER@Java~Java1_6=447#4884e20[`Person'] Line 1 Column 7 File C:/temp/Person.java)IDENTIFIER    |(type_parameters@Java~Java1_6=408#4884e80 Line 1 Column 14 File C:/temp/Person.java)type_parameters    )class_header    (class_body@Java~Java1_6=94#4885c80 Line 1 Column 14 File C:/temp/Person.java    |('{'@Java~Java1_6=448#4884e60[Keyword:0] Line 1 Column 14 File C:/temp/Person.java)'{'    |(class_body_declarations@Java~Java1_6=111#4885c60 Line 2 Column 5 File C:/temp/Person.java    | (class_body_declarations@Java~Java1_6=111#4885380 Line 2 Column 5 File C:/temp/Person.java    |  (class_body_declarations@Java~Java1_6=110#4885400 Line 2 Column 5 File C:/temp/Person.java    |   (class_body_declaration@Java~Java1_6=118#4885360 Line 2 Column 5 File C:/temp/Person.java    |   |(field_declaration@Java~Java1_6=168#4885440 Line 2 Column 5 File C:/temp/Person.java    |   | (field_modifiers@Java~Java1_6=170#4884f40 Line 2 Column 5 File C:/temp/Person.java)field_modifiers    |   | (type@Java~Java1_6=191#48852c0 Line 2 Column 5 File C:/temp/Person.java    |   |  (name@Java~Java1_6=406#48851e0 Line 2 Column 5 File C:/temp/Person.java    |   |   (IDENTIFIER@Java~Java1_6=447#4884f20[`String'] Line 2 Column 5 File C:/temp/Person.java)IDENTIFIER    |   |   (type_arguments@Java~Java1_6=407#4885160 Line 2 Column 12 File C:/temp/Person.java)type_arguments    |   |  )name    |   |  (brackets@Java~Java1_6=157#4885260 Line 2 Column 12 File C:/temp/Person.java)brackets    |   | )type    |   | (variable_declarator_list@Java~Java1_6=179#4884e00 Line 2 Column 12 File C:/temp/Person.java    |   |  (variable_declarator@Java~Java1_6=181#4885300 Line 2 Column 12 File C:/temp/Person.java    |   |   (variable_declarator_id@Java~Java1_6=167#4885320 Line 2 Column 12 File C:/temp/Person.java    |   |   |(IDENTIFIER@Java~Java1_6=447#4885140[`name'] Line 2 Column 12 File C:/temp/Person.java)IDENTIFIER    |   |   |(brackets@Java~Java1_6=157#4885040 Line 2 Column 16 File C:/temp/Person.java)brackets    |   |   )variable_declarator_id    |   |  )variable_declarator    |   | )variable_declarator_list    |   | (';'@Java~Java1_6=440#4885100[Keyword:0] Line 2 Column 16 File C:/temp/Person.java)';'    |   |)field_declaration    |   )class_body_declaration    |  )class_body_declarations    |  (class_body_declaration@Java~Java1_6=118#48852e0 Line 3 Column 5 File C:/temp/Person.java    |   (field_declaration@Java~Java1_6=168#4885480 Line 3 Column 5 File C:/temp/Person.java    |   |(field_modifiers@Java~Java1_6=170#4885340 Line 3 Column 5 File C:/temp/Person.java)field_modifiers    |   |(type@Java~Java1_6=192#4885220 Line 3 Column 5 File C:/temp/Person.java    |   | (primitive_type@Java~Java1_6=198#4885420 Line 3 Column 5 File C:/temp/Person.java    |   |  ('int'@Java~Java1_6=479#48853e0[Keyword:0] Line 3 Column 5 File C:/temp/Person.java)'int'    |   | )primitive_type    |   | (brackets@Java~Java1_6=157#4885200 Line 3 Column 12 File C:/temp/Person.java)brackets    |   |)type    |   |(variable_declarator_list@Java~Java1_6=179#4885540 Line 3 Column 12 File C:/temp/Person.java    |   | (variable_declarator@Java~Java1_6=181#4885520 Line 3 Column 12 File C:/temp/Person.java    |   |  (variable_declarator_id@Java~Java1_6=167#4885500 Line 3 Column 12 File C:/temp/Person.java    |   |   (IDENTIFIER@Java~Java1_6=447#4884fc0[`age'] Line 3 Column 12 File C:/temp/Person.java)IDENTIFIER    |   |   (brackets@Java~Java1_6=157#48854e0 Line 3 Column 15 File C:/temp/Person.java)brackets    |   |  )variable_declarator_id    |   | )variable_declarator    |   |)variable_declarator_list    |   |(';'@Java~Java1_6=440#48854c0[Keyword:0] Line 3 Column 15 File C:/temp/Person.java)';'    |   )field_declaration    |  )class_body_declaration    | )class_body_declarations    | (class_body_declaration@Java~Java1_6=117#4885c40 Line 4 Column 5 File C:/temp/Person.java    |  (method_declaration@Java~Java1_6=135#4885c00 Line 4 Column 5 File C:/temp/Person.java    |   (method_modifiers@Java~Java1_6=141#4885700 Line 4 Column 5 File C:/temp/Person.java    |   |(method_modifiers@Java~Java1_6=142#4884e40 Line 4 Column 5 File C:/temp/Person.java)method_modifiers    |   |(method_modifier@Java~Java1_6=147#48856a0 Line 4 Column 5 File C:/temp/Person.java    |   | ('public'@Java~Java1_6=453#48853a0[Keyword:0] Line 4 Column 5 File C:/temp/Person.java)'public'    |   |)method_modifier    |   )method_modifiers    |   (type_parameters@Java~Java1_6=408#4885740 Line 4 Column 12 File C:/temp/Person.java)type_parameters    |   (type@Java~Java1_6=191#4885900 Line 4 Column 12 File C:/temp/Person.java    |   |(name@Java~Java1_6=406#48852a0 Line 4 Column 12 File C:/temp/Person.java    |   | (IDENTIFIER@Java~Java1_6=447#4885660[`String'] Line 4 Column 12 File C:/temp/Person.java)IDENTIFIER    |   | (type_arguments@Java~Java1_6=407#48851a0 Line 4 Column 19 File C:/temp/Person.java)type_arguments    |   |)name    |   |(brackets@Java~Java1_6=157#48858c0 Line 4 Column 19 File C:/temp/Person.java)brackets    |   )type    |   (IDENTIFIER@Java~Java1_6=447#48855c0[`toString'] Line 4 Column 19 File C:/temp/Person.java)IDENTIFIER    |   (parameters@Java~Java1_6=158#48858e0 Line 4 Column 27 File C:/temp/Person.java    |   |('('@Java~Java1_6=450#4885840[Keyword:0] Line 4 Column 27 File C:/temp/Person.java)'('    |   |(')'@Java~Java1_6=451#4885620[Keyword:0] Line 4 Column 28 File C:/temp/Person.java)')'    |   )parameters    |   (brackets@Java~Java1_6=157#4885060 Line 5 Column 7 File C:/temp/Person.java)brackets    |   (block@Java~Java1_6=217#4885be0 Line 5 Column 7 File C:/temp/Person.java    |   |('{'@Java~Java1_6=448#48851c0[Keyword:0] Line 5 Column 7 File C:/temp/Person.java)'{'    |   |(statement_sequence@Java~Java1_6=218#4885ba0 Line 5 Column 9 File C:/temp/Person.java    |   | (statement_sequence_member@Java~Java1_6=223#4885b80 Line 5 Column 9 File C:/temp/Person.java    |   |  (executable_statement@Java~Java1_6=243#4885b60 Line 5 Column 9 File C:/temp/Person.java    |   |   ('return'@Java~Java1_6=491#4884f60[Keyword:0] Line 5 Column 9 File C:/temp/Person.java)'return'    |   |   (expression@Java~Java1_6=332#4885ac0 Line 5 Column 16 File C:/temp/Person.java    |   |   |(conditional_expression@Java~Java1_6=345#4885a60 Line 5 Column 16 File C:/temp/Person.java    |   |   | (conditional_or_expression@Java~Java1_6=347#4885a20 Line 5 Column 16 File C:/temp/Person.java    |   |   |  (conditional_and_expression@Java~Java1_6=349#48859e0 Line 5 Column 16 File C:/temp/Person.java    |   |   |   (inclusive_or_expression@Java~Java1_6=351#48857e0 Line 5 Column 16 File C:/temp/Person.java    |   |   |   |(exclusive_or_expression@Java~Java1_6=353#48855a0 Line 5 Column 16 File C:/temp/Person.java    |   |   |   | (and_expression@Java~Java1_6=355#4885940 Line 5 Column 16 File C:/temp/Person.java    |   |   |   |  (equality_expression@Java~Java1_6=357#4885880 Line 5 Column 16 File C:/temp/Person.java    |   |   |   |   (relational_expression@Java~Java1_6=360#4885800 Line 5 Column 16 File C:/temp/Person.java    |   |   |   |   |(shift_expression@Java~Java1_6=366#48856c0 Line 5 Column 16 File C:/temp/Person.java    |   |   |   |   | (additive_expression@Java~Java1_6=370#4885180 Line 5 Column 16 File C:/temp/Person.java    |   |   |   |   |  (multiplicative_expression@Java~Java1_6=373#4885780 Line 5 Column 16 File C:/temp/Person.java    |   |   |   |   |   (unary_expression@Java~Java1_6=383#4885600 Line 5 Column 16 File C:/temp/Person.java    |   |   |   |   |   |(unary_expression_not_plus_minus@Java~Java1_6=389#4885680 Line 5 Column 16 File C:/temp/Person.java    |   |   |   |   |   | (literal@Java~Java1_6=390#4884f80 Line 5 Column 16 File C:/temp/Person.java    |   |   |   |   |   |  (STRING@Java~Java1_6=536#4885120[`name'] Line 5 Column 16 File C:/temp/Person.java)STRING    |   |   |   |   |   | )literal    |   |   |   |   |   |)unary_expression_not_plus_minus    |   |   |   |   |   )unary_expression    |   |   |   |   |  )multiplicative_expression    |   |   |   |   | )additive_expression    |   |   |   |   |)shift_expression    |   |   |   |   )relational_expression    |   |   |   |  )equality_expression    |   |   |   | )and_expression    |   |   |   |)exclusive_or_expression    |   |   |   )inclusive_or_expression    |   |   |  )conditional_and_expression    |   |   | )conditional_or_expression    |   |   |)conditional_expression    |   |   )expression    |   |   (';'@Java~Java1_6=440#48856e0[Keyword:0] Line 5 Column 22 File C:/temp/Person.java)';'    |   |  )executable_statement    |   | )statement_sequence_member    |   |)statement_sequence    |   |('}'@Java~Java1_6=449#4885b40[Keyword:0] Line 5 Column 28 File C:/temp/Person.java)'}'    |   )block    |  )method_declaration    | )class_body_declaration    |)class_body_declarations    |('}'@Java~Java1_6=449#4885bc0[Keyword:0] Line 6 Column 1 File C:/temp/Person.java)'}'    )class_body   )type_declaration  )type_declarations  (optional_CONTROL_Z@Java~Java1_6=5#4885ce0 Line 7 Column 1 File C:/temp/Person.java)optional_CONTROL_Z )compilation_unit 

This is the AST (automatically generated by DMS from the CST):

(compilation_unit@Java~Java1_6=1#486f900^0 Line 1 Column 1 File C:/temp/Person.java  (type_declarations@Java~Java1_6=15#486f4c0 {1} Line 1 Column 1 File C:/temp/Person.java   (type_declaration@Java~Java1_6=17#486f5e0 Line 1 Column 1 File C:/temp/Person.java    (type_class_modifiers@Java~Java1_6=77#486eda0 Line 1 Column 1 File C:/temp/Person.java)type_class_modifiers    (class_header@Java~Java1_6=89#486ee60 Line 1 Column 1 File C:/temp/Person.java    |(IDENTIFIER@Java~Java1_6=447#486ede0[`Person'] Line 1 Column 7 File C:/temp/Person.java)IDENTIFIER    |(type_parameters@Java~Java1_6=408#486ee20 Line 1 Column 14 File C:/temp/Person.java)type_parameters    )class_header    (class_body@Java~Java1_6=94#486f040 Line 1 Column 14 File C:/temp/Person.java    |(class_body_declarations@Java~Java1_6=111#486ee40 {3} Line 2 Column 5 File C:/temp/Person.java    | (class_body_declaration@Java~Java1_6=118#486f300 Line 2 Column 5 File C:/temp/Person.java    |  (field_declaration@Java~Java1_6=168#486f380 Line 2 Column 5 File C:/temp/Person.java    |   (field_modifiers@Java~Java1_6=170#486eec0 Line 2 Column 5 File C:/temp/Person.java)field_modifiers    |   (type@Java~Java1_6=191#486f240 Line 2 Column 5 File C:/temp/Person.java    |   |(name@Java~Java1_6=406#486f180 Line 2 Column 5 File C:/temp/Person.java    |   | (IDENTIFIER@Java~Java1_6=447#486eea0[`String'] Line 2 Column 5 File C:/temp/Person.java)IDENTIFIER    |   | (type_arguments@Java~Java1_6=407#486f0e0 Line 2 Column 12 File C:/temp/Person.java)type_arguments    |   |)name    |   |(brackets@Java~Java1_6=157#486f200 Line 2 Column 12 File C:/temp/Person.java)brackets    |   )type    |   (variable_declarator@Java~Java1_6=181#486ef20 Line 2 Column 12 File C:/temp/Person.java    |   |(variable_declarator_id@Java~Java1_6=167#486efe0 Line 2 Column 12 File C:/temp/Person.java    |   | (IDENTIFIER@Java~Java1_6=447#486f0c0[`name'] Line 2 Column 12 File C:/temp/Person.java)IDENTIFIER    |   | (brackets@Java~Java1_6=157#486f060 Line 2 Column 16 File C:/temp/Person.java)brackets    |   |)variable_declarator_id    |   )variable_declarator    |  )field_declaration    | )class_body_declaration    | (class_body_declaration@Java~Java1_6=118#486f000 Line 3 Column 5 File C:/temp/Person.java    |  (field_declaration@Java~Java1_6=168#486f320 Line 3 Column 5 File C:/temp/Person.java    |   (field_modifiers@Java~Java1_6=170#486f2a0 Line 3 Column 5 File C:/temp/Person.java)field_modifiers    |   (type@Java~Java1_6=192#486eee0 Line 3 Column 5 File C:/temp/Person.java    |   |(primitive_type@Java~Java1_6=198#486ef60 Line 3 Column 5 File C:/temp/Person.java)primitive_type    |   |(brackets@Java~Java1_6=157#486ee00 Line 3 Column 12 File C:/temp/Person.java)brackets    |   )type    |   (variable_declarator@Java~Java1_6=181#486f2c0 Line 3 Column 12 File C:/temp/Person.java    |   |(variable_declarator_id@Java~Java1_6=167#486f3a0 Line 3 Column 12 File C:/temp/Person.java    |   | (IDENTIFIER@Java~Java1_6=447#486f120[`age'] Line 3 Column 12 File C:/temp/Person.java)IDENTIFIER    |   | (brackets@Java~Java1_6=157#486ef00 Line 3 Column 15 File C:/temp/Person.java)brackets    |   |)variable_declarator_id    |   )variable_declarator    |  )field_declaration    | )class_body_declaration    | (class_body_declaration@Java~Java1_6=117#486f7a0 Line 4 Column 5 File C:/temp/Person.java    |  (method_declaration@Java~Java1_6=135#486f480 Line 4 Column 5 File C:/temp/Person.java    |   (method_modifiers@Java~Java1_6=141#486f460 {1} Line 4 Column 5 File C:/temp/Person.java    |   |(method_modifier@Java~Java1_6=147#486f400 Line 4 Column 5 File C:/temp/Person.java)method_modifier    |   )method_modifiers    |   (type_parameters@Java~Java1_6=408#486f540 Line 4 Column 12 File C:/temp/Person.java)type_parameters    |   (type@Java~Java1_6=191#486f740 Line 4 Column 12 File C:/temp/Person.java    |   |(name@Java~Java1_6=406#486f620 Line 4 Column 12 File C:/temp/Person.java    |   | (IDENTIFIER@Java~Java1_6=447#486f080[`String'] Line 4 Column 12 File C:/temp/Person.java)IDENTIFIER    |   | (type_arguments@Java~Java1_6=407#486f640 Line 4 Column 19 File C:/temp/Person.java)type_arguments    |   |)name    |   |(brackets@Java~Java1_6=157#486f700 Line 4 Column 19 File C:/temp/Person.java)brackets    |   )type    |   (IDENTIFIER@Java~Java1_6=447#486f140[`toString'] Line 4 Column 19 File C:/temp/Person.java)IDENTIFIER    |   (parameters@Java~Java1_6=158#486f760 Line 4 Column 27 File C:/temp/Person.java)parameters    |   (brackets@Java~Java1_6=157#486f820 Line 5 Column 7 File C:/temp/Person.java)brackets    |   (block@Java~Java1_6=217#486f780 Line 5 Column 7 File C:/temp/Person.java    |   |(statement_sequence@Java~Java1_6=218#486f6e0 Line 5 Column 9 File C:/temp/Person.java    |   | (statement_sequence_member@Java~Java1_6=223#486f6c0 Line 5 Column 9 File C:/temp/Person.java    |   |  (executable_statement@Java~Java1_6=243#486f6a0 Line 5 Column 9 File C:/temp/Person.java    |   |   (unary_expression_not_plus_minus@Java~Java1_6=389#486f720 Line 5 Column 16 File C:/temp/Person.java    |   |   |(literal@Java~Java1_6=390#486f280 Line 5 Column 16 File C:/temp/Person.java    |   |   | (STRING@Java~Java1_6=536#486f160[`name'] Line 5 Column 16 File C:/temp/Person.java)STRING    |   |   |)literal    |   |   )unary_expression_not_plus_minus    |   |  )executable_statement    |   | )statement_sequence_member    |   |)statement_sequence    |   )block    |  )method_declaration    | )class_body_declaration    |)class_body_declarations    )class_body   )type_declaration  )type_declarations  (optional_CONTROL_Z@Java~Java1_6=5#486f4e0 Line 7 Column 1 File C:/temp/Person.java)optional_CONTROL_Z )compilation_unit 

EDIT March 2015: Here's a link to some C++ AST examples

Edit May 2015: DMS has long done Java 1.7 and 1.8, too.

like image 28
Ira Baxter Avatar answered Nov 08 '22 08:11

Ira Baxter