What's the difference between parse trees and abstract syntax trees (ASTs)?

2 Answers

This is based on the Expression Evaluator grammar by Terrence Parr.

The grammar for this example:

grammar Expr002;  options  {     output=AST;     ASTLabelType=CommonTree; // type of $stat.tree ref etc... }  prog    :   ( stat )+ ;  stat    :   expr NEWLINE        -> expr         |   ID '=' expr NEWLINE -> ^('=' ID expr)         |   NEWLINE             ->         ;  expr    :   multExpr (( '+'^ | '-'^ ) multExpr)*         ;   multExpr         :   atom ('*'^ atom)*         ;   atom    :   INT          |   ID         |   '('! expr ')'!         ;  ID      : ('a'..'z' | 'A'..'Z' )+ ; INT     : '0'..'9'+ ; NEWLINE : '\r'? '\n' ; WS      : ( ' ' | '\t' )+ { skip(); } ;

Input

x=1 y=2 3*(x+y)

Parse Tree

The parse tree is a concrete representation of the input. The parse tree retains all of the information of the input. The empty boxes represent whitespace, i.e. end of line.

Parse Tree

AST

The AST is an abstract representation of the input. Notice that parens are not present in the AST because the associations are derivable from the tree structure.

AST

For a more through explanation see Compilers and Compiler Generators pg. 23
or Abstract Syntax Trees on pg. 21 in Syntax and Semantics of Programming Languages

183

answered Sep 22 '22 05:09

Guy Coder

Here's an explanation of parse trees (concrete syntax trees, CSTs) and abstract syntax trees (ASTs), in the context of compiler construction. They're similar data structures, but they're constructed differently and used for different tasks.

Parse trees

Parse trees are usually generated as the next step after lexical analysis (which turns the source code into a series of tokens that can be viewed as meaningful units, as opposed to just a sequence of characters).

They are tree-like data structures that shows how an input string of terminals (source code tokens) has been generated by the grammar of the language in question. The root of the parse tree is the most general symbol of the grammar - the start symbol (for example, statement), and the interior nodes represent nonterminal symbols that the start symbol expands to (can include the start symbol itself), such as expression, statement, term, function call. The leaves are the terminals of the grammar, the actual symbols which appear as identifiers, keywords, and constants in the language / input string, e.g. for, 9, if, etc.

While parsing the compiler also performs various checks to ensure the correctness of syntax - and and syntax error reports can be imbedded into parser code.

They can be used for syntax-directed translation via syntax-directed definitions or translation schemes, for simple tasks such as converting an infix expression to a postfix one.

Here's a graphical representation of a parse tree for the expression 9 - 5 + 2 (note the placement of the terminals in the tree and the actual symbols from the expression string):

enter image description here

Abstract syntax trees

ASTs represent the syntactic structure of the some code. The trees of programming constructs such as expressions, flow control statements, etc - grouped into operators (interior nodes) and operands (leaves). For example, the syntax tree for the expression i + 9 would have the operator + as root, the variable i as the operator's left child, and the number 9 as the right child.

The difference here is that nonterminals and terminals don't play a role, as ASTs don't deal with grammars and string generation, but programming constructs, and thus they represent relationships between such constructs, and not the ways they are generated by a grammar.

Note that the operators themselves are programming constructs in a given language, and don't have to be actual computational operators (like + is): for loops would also be treated in this way. For example, you could have a syntax tree such as for [ expr, expr, expr, stmnt ] (represented inline), where for is an operator, and the elements inside the square brackets are its children (representing C's for syntax) - also composed out of operators etc.

ASTs are usually generated by compilers in the syntax analysis (parsing) phase as well, and are used later for semantic analysis, intermediate representation, code generation, etc.

Here's a graphical representation of an AST:

enter image description here

answered Sep 24 '22 05:09

corazza

Related questions
                            
                                What is the difference between LR(0) and SLR parsing?
                            
                                Where can I learn the basics of writing a lexer?
                            
                                Do compilers produce better code for do-while loops versus other types of loops?
                            
                                How can the Linux kernel compile itself?
                            
                                Why is writing a compiler in a functional language easier? [closed]
                            
                                combine two GCC compiled .o object files into a third .o file
                            
                                Is gcc C compiler written in C itself?
                            
                                GCC: how is march different from mtune?
                            
                                How to approach creating a JVM programming language?
                            
                                Is the creation of Java class files deterministic?
                            
                                What kinds of patterns could I enforce on the code to make it easier to translate to another programming language? [closed]
                            
                                Compile, Build or Archive problems with Xcode 4 (and dependencies)
                            
                                Why does this Java code compile?
                            
                                What is the difference between a token and a lexeme?
                            
                                Are GCC and Clang parsers really handwritten?
                            
                                implementing type inference
                            
                                Why does this (null || !TryParse) conditional result in "use of unassigned local variable"?
                            
                                Bootstrapping still requires outside support
                            
                                Eclipse: enable assertions
                            
                                Why is it impossible to build a compiler that can determine if a C++ function will change the value of a particular variable?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What's the difference between parse trees and abstract syntax trees (ASTs)?

Tags:

terminology

compiler-construction

abstract-syntax-tree

compiler-theory

parse-tree

Thomson

People also ask