Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between an Abstract Syntax Tree and a Concrete Syntax Tree?

I've been reading a bit about how interpreters/compilers work, and one area where I'm getting confused is the difference between an AST and a CST. My understanding is that the parser makes a CST, hands it to the semantic analyzer which turns it into an AST. However, my understanding is that the semantic analyzer simply ensures that rules are followed. I don't really understand why it would actually make any changes to make it abstract rather than concrete.

Is there something that I'm missing about the semantic analyzer, or is the difference between an AST and CST somewhat artificial?

like image 796
Jason Baker Avatar asked Dec 11 '09 15:12

Jason Baker


People also ask

What is the difference between abstract syntax and concrete syntax?

It consists of a set of rules (productions) that define the way programs look like to the programmer. The concrete syntax of Pico is defined by the productions in the file ConcrSyn. pco . The abstract syntax of an implementation is the set of trees used to represent programs in the implementation.

What is Abstract Syntax Tree?

An Abstract Syntax Tree, or AST, is a tree representation of the source code of a computer program that conveys the structure of the source code. Each node in the tree represents a construct occurring in the source code.


1 Answers

A concrete syntax tree represents the source text exactly in parsed form. In general, it conforms to the context-free grammar defining the source language.

However, the concrete grammar and tree have a lot of things that are necessary to make source text unambiguously parseable, but do not contribute to actual meaning. For example, to implement operator precedence, your CFG usually has several levels of expression components (term, factor, etc.), with the operators connecting them at the different levels (you add terms to get expressions, terms are composed of factors optionally multipled, etc.). To actually interpret or compile the language, however, you don't need this; you just need Expression nodes that have operators and operands. The abstract syntax tree is the result of simplifying the concrete syntax tree down to the things actually needed to represent the meaning of the program. This tree has a much simpler definition and is thus easier to process in the later stages of execution.

You usually don't need to actually build a concrete syntax tree. The action routines in your YACC (or Antlr, or Menhir, or whatever...) grammar can directly build the abstract syntax tree, so the concrete syntax tree only exists as a conceptual entity representing the parse structure of your source text.

like image 71
Michael Ekstrand Avatar answered Sep 23 '22 04:09

Michael Ekstrand