Recursive Descent Parser

Tags:

The book 'Modern Compiler Design' is the nice book about compilers. In its source code something that is annoying me is AST or Abstract Syntax Tree. Suppose we want to write a parenthesized expression parser which parses something like: ((2+3)*4) * 2! The book says that we have an AST like:

        ((2+3)*4) * 2
          /   |     \
       (2+3)  *4    * 2
        /     | \
     (2+3)    *  4
     / | \
    2  + 3

So should I save a tree in memory or just use recursive calls; Note: if I don't store it in memory, how can I convert it to machine code ?

Parser code:

int parse(Expression &expr)
{
  if(token.class=='D')
  { 
    expr.type='D';
    expr.value=token.val-'0';
    get_next_token();
    return 1;
  }
  if(token.class=='(') 
  {
    expr.type='P';
    get_next_token();
    parse(&expr->left);
    parse_operator(&expr->op);
    parse(&expr->right);
    if(token.class!=')')
      Error("missing )");
    get_next_token();
    return 1;
  }
  return 0;
}

Grammar is:

expr -> expr | (expr op expr)
digit   -> 0|1|2....|9
op  -> +|*

598

asked Jan 07 '12 06:01

S.A.Parkhid

2 Answers

You can store the tree in memory or you can directly produce the required output code. Storing the intermediate form is normally done to be able to do some processing on the code at an higher level before generating output.

In your case for example it would be simple to discover that your expression contains no variables and therefore the result is a fixed number. Looking only at one node at a time this however is not possible. To be more explicit if after looking at "2*" you generate machine code for computing the double of something this code is sort of wasted when the other part is for example "3" because your program will compute "3" and then compute the double of that every time while just loading "6" would be equivalent but shorter and faster.

If you want to generate the machine code then you need first to know for what kind of machine the code is going to be generated... the simplest model uses a stack-based approach. In this case you need no register allocation logic and it's easy to compile directly to machine code without the intermediate representation. Consider this small example that handles just integers, four operations, unary negation and variables... you will notice that no data structure is used at all: source code characters are read and machine instructions are written to output...

#include <stdio.h>
#include <stdlib.h>

void error(const char *what) {
    fprintf(stderr, "ERROR: %s\n", what);
    exit(1);
}

void compileLiteral(const char *& s) {
    int v = 0;
    while (*s >= '0' && *s <= '9') {
        v = v*10 + *s++ - '0';
    }
    printf("    mov  eax, %i\n", v);
}

void compileSymbol(const char *& s) {
    printf("    mov  eax, dword ptr ");
    while ((*s >= 'a' && *s <= 'z') ||
           (*s >= 'A' && *s <= 'Z') ||
           (*s >= '0' && *s <= '9') ||
           (*s == '_')) {
        putchar(*s++);
    }
    printf("\n");
}

void compileExpression(const char *&);

void compileTerm(const char *& s) {
    if (*s >= '0' && *s <= '9') {
        // Number
        compileLiteral(s);
    } else if ((*s >= 'a' && *s <= 'z') ||
               (*s >= 'A' && *s <= 'Z') ||
               (*s == '_')) {
        // Variable
        compileSymbol(s);
    } else if (*s == '-') {
        // Unary negation
        s++;
        compileTerm(s);
        printf("    neg  eax\n");
    } else if (*s == '(') {
        // Parenthesized sub-expression
        s++;
        compileExpression(s);
        if (*s != ')')
            error("')' expected");
        s++;
    } else {
        error("Syntax error");
    }
}

void compileMulDiv(const char *& s) {
    compileTerm(s);
    for (;;) {
        if (*s == '*') {
            s++;
            printf("    push eax\n");
            compileTerm(s);
            printf("    mov  ebx, eax\n");
            printf("    pop  eax\n");
            printf("    imul ebx\n");
        } else if (*s == '/') {
            s++;
            printf("    push eax\n");
            compileTerm(s);
            printf("    mov  ebx, eax\n");
            printf("    pop  eax\n");
            printf("    idiv ebx\n");
        } else break;
    }
}

void compileAddSub(const char *& s) {
    compileMulDiv(s);
    for (;;) {
        if (*s == '+') {
            s++;
            printf("    push eax\n");
            compileMulDiv(s);
            printf("    mov  ebx, eax\n");
            printf("    pop  eax\n");
            printf("    add  eax, ebx\n");
        } else if (*s == '-') {
            s++;
            printf("    push eax\n");
            compileMulDiv(s);
            printf("    mov  ebx, eax\n");
            printf("    pop  eax\n");
            printf("    sub  eax, ebx\n");
        } else break;
    }
}

void compileExpression(const char *& s) {
    compileAddSub(s);
}

int main(int argc, const char *argv[]) {
    if (argc != 2) error("Syntax: simple-compiler <expr>\n");
    compileExpression(argv[1]);
    return 0;
}

For example running the compiler with 1+y*(-3+x) as input you get as output

mov  eax, 1
push eax
mov  eax, dword ptr y
push eax
mov  eax, 3
neg  eax
push eax
mov  eax, dword ptr x
mov  ebx, eax
pop  eax
add  eax, ebx
mov  ebx, eax
pop  eax
imul ebx
mov  ebx, eax
pop  eax
add  eax, ebx

However this approach of writing compilers doesn't scale well to an optimizing compiler.

While it's possible to get some optimization by adding a "peephole" optimizer in the output stage, many useful optimizations are possible only looking at code from an higher point of view.

Also even the bare machine code generation could benefit by seeing more code, for example to decide which register assign to what or to decide which of the possible assembler implementations would be convenient for a specific code pattern.

For example the same expression could be compiled by an optimizing compiler to

mov  eax, dword ptr x
sub  eax, 3
imul dword ptr y
inc  eax

142

answered Oct 09 '22 03:10

6502

Nine times out of ten you'll save the AST in memory for whatever you are doing after lexing and parsing are done.

Once you have an AST you can do a number of things:

Evaluate it directly (perhaps using recursion, perhaps using your own custom stack)
Transform it into some other output, such as code in another language or some other type of translation.
Compile it to preferred instruction set
etc.

answered Oct 09 '22 03:10

sirbrialliance

Related questions
                            
                                Determine if angle lies between 2 other angles
                            
                                what is int(a)(1)? is this a valid c++ syntax?
                            
                                What is the "correct" way to reconcile malloc and new in a mixed C/C++ program?
                            
                                C++ index of type during variadic template expansion
                            
                                Is floating point multiplication by zero guaranteed to produce zero?
                            
                                need STL set in insertion order
                            
                                Draw rectangle in OpenCV
                            
                                Class with all automatically-generated constructors/operators deleted can still be returned from a function?
                            
                                std::variant<>::get() does not compile with Apple LLVM 10.0
                            
                                C++ threads inside a 'for' loop print wrong values
                            
                                Priority when choosing overloaded template functions in C++
                            
                                Iterating through a Lua table from C++?
                            
                                How Non-Member Functions Improve Encapsulation
                            
                                Why is it not possible to access the size of a new[]'d array?
                            
                                Using read() directly into a C++ std:vector
                            
                                C++ - Is it possible to implement memory leak testing in a unit test?
                            
                                understanding vptr in multiple inheritance?
                            
                                Is there a use for function declarations inside functions?
                            
                                To find largest element smaller than K in a BST
                            
                                How do I prevent an implicit cast double -> int?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Recursive Descent Parser

Tags:

c++

parsing

compiler-construction

S.A.Parkhid

People also ask

2 Answers

6502

sirbrialliance

Recent Activity

Donate For Us