How to construct an abstract syntax tree

2 Answers

Well, first off, the grammar is used to construct a parse tree from an expression. So if you already have a parse tree, you don't need the grammar.

Depending on how much work your parser does, the resulting tree that is formed from parsing an expression could already be an abstract syntax tree. Or it could be a simple parse tree which requires a second pass to construct the ast.

To construct the parse tree from a grammar and an expression, you would first have to convert your grammar into working code. Typically, you would split the work into a tokenizer which splits the input stream representing the expression into a list of tokens, and a parser which takes the list of tokens and constructs a parse tree\ast from it.

So the expression 1 + 2*(3+4) might be split into a list of tokens like this:

1 - int + - add_operator 2 - int * - mul_operator ( - lparen 3 - int + - add_operator 4 - int ) - rparen

The first column is the actual text value. The second represents the token type. These tokens are fed into the parser, which is built from your grammar and recognizes the tokens and builds the parse tree.

So, how does one write the lexical tokenizer and the actual parser? You could roll your own by hand. Or, more commonly, use a parser generator like coco or antlr or lex/yacc. These tools take a description of your grammar and generate the code for a tokenzier and parser. (Code generators exist for most popular languages and some unpopular ones as well.)

How you construct your parser depends heavily on what language you use. How you would write a parser in Haskell is entirely different from how you would in, say, C.

Here's a tutorial that shows you how to build your own recursive descent parser.
Coco is a parser generator for various languages, which also comes with documentation on how to get started.
If Python's your thing, then pyparsing maybe for you.

145

answered Oct 07 '22 06:10

HS.

The answers have been dis-satisfactory at answering the question of "how to construct an AST".

This article archived from the series Crafting Interpreters, answers it neatly and easily for beginners. Complete with implementation. Stackoverflow is not a place for links so I will copy the main points.

Recursive Descent Parsing

There is a whole pack of parsing techniques whose names mostly seem to be combinations of “L” and “R”—LL(k), LR(1), LALR—along with more exotic beasts like parser combinators, Earley parsers, the shunting yard algorithm, and packrat parsing. For our first interpreter, one technique is more than sufficient: recursive descent.

Recursive descent is the simplest way to build a parser, and doesn’t require using complex parser generator tools like Yacc, Bison or ANTLR. All you need is straightforward hand-written code. Don’t be fooled by its simplicity, though. Recursive descent parsers are fast, robust, and can support sophisticated error-handling. In fact, GCC, V8 (the JavaScript VM in Chrome), Roslyn (the C# compiler written in C#) and many other heavyweight production language implementations use recursive descent. It kicks ass.

It is considered a top-down parser because it starts from the top or outermost grammar rule (here expression) and works its way down into the nested subexpressions before finally reaching the leaves of the syntax tree. This is in contrast with bottom-up parsers like LR that start with primary expressions and compose them into larger and larger chunks of syntax.

A recursive descent parser is a literal translation of the grammar’s rules straight into imperative code. Each rule becomes a function. The body of the rule translates to code roughly like:

Grammar notation    Code representation Terminal            Code to match and consume a token NonterminalCall     to that rule’s function |                   if or switch statement * or +              while or for loop ?                   if statement

It’s called “recursive descent” because when a grammar rule refers to itself—directly or indirectly—that translates to recursive method calls.

Side notes:

It’s called “recursive descent” because it walks down the grammar. Confusingly, we also use direction metaphorically when talking about “high” and “low” precedence, but the orientation is reversed. In a top-down parser, you reach the lowest-precedence expressions first because they may in turn contain subexpressions of higher precedence. Top-down grammar rules in order of increasing precedence.

Original image link

parsing expressions direction

CS people really need to get together and straighten out their metaphors. Don’t even get me started on which direction the stack is supposed to grow.

answered Oct 07 '22 05:10

Default

Related questions
                            
                                Downcast traits inside Rc for AST manipulation
                            
                                What is the best way to simulate java.lang.Thread?
                            
                                What are motivations behind compiling to byte-code?
                            
                                Example for ast.NodeTransformer that mutates an equation
                            
                                Python AST with preserved comments
                            
                                How to output the AST built using ANTLR?
                            
                                What does Lambda Expression Compile() method do?
                            
                                Representing an Abstract Syntax Tree in C
                            
                                Compiling an AST back to source code
                            
                                How to write the Visitor Pattern for Abstract Syntax Tree in Python?
                            
                                What is an AST transformation?
                            
                                How to get lineno of "end-of-statement" in Python ast
                            
                                Python 3, Are there any known security holes in ast.literal_eval(node_or_string)?
                            
                                Constructing an Abstract Syntax Tree with a list of Tokens
                            
                                How to view Clang AST?
                            
                                Malformed String ValueError ast.literal_eval() with String representation of Tuple
                            
                                TypeScript: get syntax tree
                            
                                Simple example of how to use ast.NodeVisitor?
                            
                                What is JavaScript AST, how to play with it?
                            
                                What's the difference between parse trees and abstract syntax trees? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to construct an abstract syntax tree

Tags:

abstract-syntax-tree

neuromancer

People also ask

2 Answers

HS.

Recursive Descent Parsing

Default

Recent Activity

Donate For Us