Crockfords Top Down Operator Precedence

Question

Out of interest, i want to learn how to write a parser for a simple language, to ultimately write an interpreter for my own little code-golfing language, as soon as i understood how such things work in general.

So I started reading Douglas Crockfords article Top Down Operator Precedence.

Note: You should probably read the article if you want a deeper understanding of the context of the code snippets below

I have trouble understanding how the var statement and the assignment operator = should work together.

D.C. defines an assignment operator like

var assignment = function (id) {
    return infixr(id, 10, function (left) {
        if (left.id !== "." && left.id !== "[" &&
                left.arity !== "name") {
            left.error("Bad lvalue.");
        }
        this.first = left;
        this.second = expression(9);
        this.assignment = true;
        this.arity = "binary";
        return this;
    });
};
assignment("=");

Note: [[value]] refers to a token, simplified to its value

Now if the expression function reaches e.g. [[t]],[[=]],[[2]],the result of [[=]].led is something like this.

{
    "arity": "binary",
    "value": "=",
    "assignment": true, //<-
    "first": {
        "arity": "name",
        "value": "t"
    },
    "second": {
        "arity": "literal",
        "value": "2"
    }
}

D.C. makes the assignment function because

we want it to do two extra bits of business: examine the left operand to make sure that it is a proper lvalue, and set an assignment member so that we can later quickly identify assignment statements.

Which makes sense to me up to the point where he introduces the var statement, which is defined as follows.

The var statement defines one or more variables in the current block. Each name can optionally be followed by = and an initializing expression.

stmt("var", function () {
    var a = [], n, t;
    while (true) {
        n = token;
        if (n.arity !== "name") {
            n.error("Expected a new variable name.");
        }
        scope.define(n);
        advance();
        if (token.id === "=") {
            t = token;
            advance("=");
            t.first = n;
            t.second = expression(0);
            t.arity = "binary";
            a.push(t);
        }
        if (token.id !== ",") {
            break;
        }
        advance(",");
    }
    advance(";");
    return a.length === 0 ? null : a.length === 1 ? a[0] : a;
});

Now if the parser reaches a set of tokens like [[var]],[[t]],[[=]],[[1]] the generated tree would look something like.

{
    "arity": "binary",
    "value": "=",
    "first": {
        "arity": "name",
        "value": "t"
    },
    "second": {
        "arity": "literal",
        "value": "1"
    }
}

The keypart of my question is the if (token.id === "=") {...} part.

I don't understand why we call

    t = token;
    advance("=");
    t.first = n;
    t.second = expression(0);
    t.arity = "binary";
    a.push(t);

rather than

    t = token;
    advance("=");
    t.led (n);
    a.push(t);

in the ... part.

which would call our [[=]] operators led function (the assignment function), which does

make sure that it is a proper lvalue, and set an assignment member so that we can later quickly identify assignment statements. e.g

{
    "arity": "binary",
    "value": "=",
    "assignment": true,
    "first": {
        "arity": "name",
        "value": "t"
    },
    "second": {
        "arity": "literal",
        "value": "1"
    }
}

since there is no operator with a lbp between 0 and 10, calling expression(0) vs. expression (9) makes no difference. (!(0<0) && !(9<0) && 0<10 && 9<10))

And the token.id === "=" condition prevents assignments to an object member as token.id would either be '[' or '.' and t.led wouldn't be called.

My question in short is:

Why do we not call the, optionally after a variable declaration followable, assignment operators' available led function. But instead manually set the first and second members of the statement but not the assignment member ?

Here are two fiddles parsing a simple string. Using the original code and one using the assignment operators led.

Benjamin Gruenbaum · Accepted Answer

When parsing a language, two things matter - Semantics and Syntax.

Semantically, var x=5; and var x;x=5 seem very close if not identical (Since in both cases first a variable is declared and then a value is assigned to that declared variable. This is what you've observed and is correct for the most part.

Syntactically however, the two differ (which is clearly visible).

In natural language, an analogue would be:

The boy has an apple.
There is an apple, the boy has it.

Now to be concise! Let's look at the two examples.

While the two (pretty much) mean the same thing, they are clearly not the same sentence. Back to JavaScript!

The first one: var x=5 is read the following way:

var                      x              =                  5
-----------------------VariableStatement--------------------
var -------------------        VariableDeclarationList 
var -------------------        VariableDeclaration
var            Identifier -------   Initialiser(opt)
var ------------------- x              = AssignmentExpression
var ------------------- x ------------ = LogicalORExpression
var ------------------- x ------------ = LogicalANDExpression
var ------------------- x ------------ = BitwiseORExpression
var ------------------- x ------------ = BitwiseXORExpression
var ------------------- x ------------ = BitwiseANDExpression 
var ------------------- x ------------ = EqualityExpression
var ------------------- x ------------ = ShiftExpression
var ------------------- x ------------ = AdditiveExpression
var ------------------- x ------------ = MultiplicativeExpression
var ------------------- x ------------ = UnaryExpression
var ------------------- x ------------ = PostfixExpression 
var ------------------- x ------------ = NewExpression
var ------------------- x ------------ = MemberExpression
var ------------------- x ------------ = PrimaryExpression
var ------------------- x ------------ = Literal
var ------------------- x ------------ = NumericLiteral
var ------------------- x ------------ = DecimalLiteral
var ------------------- x ------------ = DecimalDigit 
var ------------------- x ------------ = 5

Phew! All this had to happen syntactically to parse var x = 5 , sure, a lot of it is handling expressions - but it is what it is, let us check the other version.

This breaks into two statements. var x; x = 5 The first one is:

var                      x 
--------VariableStatement---
var ---- VariableDeclarationList 
var ---- VariableDeclaration
var                 Idenfifier (optional initializer not present)
var                      x

The second part is x=5 which is an assignment statement. I can go on with the same expression madness - but it's pretty much the same.

So in conclusion, while the two produce the same result semantically, syntactically as the official language grammar specifies - they are different. The result, in this case - is indeed the same.

Strix · Answer

I don't have time to read the whole article, so I am not hundred percent sure. In my opinion the reason is because the assignment operator in var statement is a bit special. It doesn't accept all possible left values - no members of an object are allowed (no . or [ operators). Only plain variable names are allowed.

So we can't use normal assignment function because it allows all left values.

I am quite sure about this, but the following is just a guess:

We would have to call assignment function optionally and only after we checked that we consumed the assignment operator.

  advance();
  if (token.id === "=") {
      // OK, Now we know that there is an assignment.

But the function assignment assumes that current token is a left value, not operator =.

I have no idea why the assignment member is not set to true. It depends on what you want to do with the generated tree. Again, assignment in var statement is a bit special and it might not be feasible to set it.

Crockfords Top Down Operator Precedence

Tags:

javascript

parsing

Moritz Roessler

2 Answers

Benjamin Gruenbaum

Strix

Recent Activity

Donate For Us

Crockfords Top Down Operator Precedence

Tags:

javascript

parsing

Moritz Roessler

2 Answers

Benjamin Gruenbaum

Strix

Related questions

Recent Activity

Donate For Us