Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing expressions from Token Stream

I'm trying to parse expressions for a simple scripting language, but I'm confused. Right now, only numbers and string literals can be parsed as an expression:

int x = 5;
double y = 3.4;
str t = "this is a string";

However, I am confused on parsing more complex expressions:

int a = 5 + 9;
int x = (5 + a) - (a ^ 2);

I'm thinking I would implement it like:

do {
    // no clue what I would do here?        

    if (current token is a semi colon) break;
}
while (true);

Any help would be great, I have no clue where to start. Thanks.

EDIT: My parser is a Recursive Descent Parser

My expression "class" is as follows:

typedef struct s_Expression {
    char type;
    Token *value;

    struct s_Expression *leftHand;
    char operand;
    struct s_Expression *rightHand;
} ExpressionNode;

Someone mentioned that a Recursive Descent Parser is capable of parsing expressions without doing infix, to postfix. Preferably, I would like an Expression like this:

For example this:

int x = (5 + 5) - (a / b);

Would be parsed into this: Note: this isn't valid C, this is just some pseudo ish code to get my point across simply :)

ExpressionNode lh;
lh.leftHand = new ExpressionNode(5);
lh.operand = '+'
lh.rightHand = new ExpressionNode(5);

ExpressionNode rh;
rh.leftHand = new ExpressionNode(a);
rh.operand = '/';
rh.rightHand = new ExpressionNode(b);

ExpressionNode res;
res.leftHand = lh;
res.operand = '-';
res.rightHand = rh;

I asked the question pretty late at night, so sorry if I wasn't clear and that I completely forgot what my original goal was.

like image 469
metro-man Avatar asked Nov 01 '22 13:11

metro-man


2 Answers

There are multiple methods to do this. One I've used in the past is to read the input string (that is, the programming language code) and convert expressions from infix to reverse-polish notation. Here's a really good post about doing just that:

http://andreinc.net/2010/10/05/converting-infix-to-rpn-shunting-yard-algorithm/

Note that = is an operator also, so your parsing should really include the whole file of code, and not just certain expressions.

Once in reverse-polish, expressions are super-easy to evaluate. You just pop the stack, storing operands as you go, until you hit an operator. Pop as many operands from the stack as required by the operator and perform your operation.

like image 151
Jonathan M Avatar answered Nov 15 '22 06:11

Jonathan M


The method I ended up using is Operator precedence parsing.

parse_expression_1 (lhs, min_precedence)
    lookahead := peek next token
    while lookahead is a binary operator whose precedence is >= min_precedence
        op := lookahead
        advance to next token
        rhs := parse_primary ()
        lookahead := peek next token
        while lookahead is a binary operator whose precedence is greater
                 than op's, or a right-associative operator
                 whose precedence is equal to op's
            rhs := parse_expression_1 (rhs, lookahead's precedence)
            lookahead := peek next token
        lhs := the result of applying op with operands lhs and rhs
    return lhs
like image 38
metro-man Avatar answered Nov 15 '22 07:11

metro-man