Out of interest, i want to learn how to write a parser for a simple language, to ultimately write an interpreter for my own little code-golfing language, as soon as i understood how such things work in general.
So I started reading Douglas Crockfords article Top Down Operator Precedence.
Note: You should probably read the article if you want a deeper understanding of the context of the code snippets below
I have trouble understanding how the var
statement and the assignment operator =
should work together.
D.C. defines an assignment operator like
var assignment = function (id) {
return infixr(id, 10, function (left) {
if (left.id !== "." && left.id !== "[" &&
left.arity !== "name") {
left.error("Bad lvalue.");
}
this.first = left;
this.second = expression(9);
this.assignment = true;
this.arity = "binary";
return this;
});
};
assignment("=");
Note: [[value]] refers to a token, simplified to its value
Now if the expression function reaches e.g. [[t]],[[=]],[[2]]
,the result of [[=]].led
is something like this.
{
"arity": "binary",
"value": "=",
"assignment": true, //<-
"first": {
"arity": "name",
"value": "t"
},
"second": {
"arity": "literal",
"value": "2"
}
}
D.C. makes the assignment
function because
we want it to do two extra bits of business: examine the left operand to make sure that it is a proper lvalue, and set an assignment member so that we can later quickly identify assignment statements.
Which makes sense to me up to the point where he introduces the
var
statement, which is defined as follows.
The var statement defines one or more variables in the current block. Each name can optionally be followed by = and an initializing expression.
stmt("var", function () {
var a = [], n, t;
while (true) {
n = token;
if (n.arity !== "name") {
n.error("Expected a new variable name.");
}
scope.define(n);
advance();
if (token.id === "=") {
t = token;
advance("=");
t.first = n;
t.second = expression(0);
t.arity = "binary";
a.push(t);
}
if (token.id !== ",") {
break;
}
advance(",");
}
advance(";");
return a.length === 0 ? null : a.length === 1 ? a[0] : a;
});
Now if the parser reaches a set of tokens like [[var]],[[t]],[[=]],[[1]]
the generated tree would look something like.
{
"arity": "binary",
"value": "=",
"first": {
"arity": "name",
"value": "t"
},
"second": {
"arity": "literal",
"value": "1"
}
}
The keypart of my question is the if (token.id === "=") {...}
part.
I don't understand why we call
t = token;
advance("=");
t.first = n;
t.second = expression(0);
t.arity = "binary";
a.push(t);
rather than
t = token;
advance("=");
t.led (n);
a.push(t);
in the ...
part.
which would call our [[=]]
operators led
function (the assignment function), which does
make sure that it is a proper lvalue, and set an assignment member so that we can later quickly identify assignment statements. e.g
{
"arity": "binary",
"value": "=",
"assignment": true,
"first": {
"arity": "name",
"value": "t"
},
"second": {
"arity": "literal",
"value": "1"
}
}
since there is no operator with a lbp
between 0 and 10, calling expression(0) vs. expression (9)
makes no difference. (!(0<0) && !(9<0) && 0<10 && 9<10)
)
And the token.id === "="
condition prevents assignments to an object member as token.id
would either be '['
or '.'
and t.led
wouldn't be called.
My question in short is:
Why do we not call the, optionally after a variable declaration followable, assignment operators' available led
function. But instead manually set the first
and second
members of the statement but not the assignment
member ?
Here are two fiddles parsing a simple string. Using the original code and one using the assignment operators led
.
When parsing a language, two things matter - Semantics and Syntax.
Semantically, var x=5;
and var x;x=5
seem very close if not identical (Since in both cases first a variable is declared and then a value is assigned to that declared variable. This is what you've observed and is correct for the most part.
Syntactically however, the two differ (which is clearly visible).
In natural language, an analogue would be:
Now to be concise! Let's look at the two examples.
While the two (pretty much) mean the same thing, they are clearly not the same sentence. Back to JavaScript!
The first one: var x=5
is read the following way:
var x = 5
-----------------------VariableStatement--------------------
var ------------------- VariableDeclarationList
var ------------------- VariableDeclaration
var Identifier ------- Initialiser(opt)
var ------------------- x = AssignmentExpression
var ------------------- x ------------ = LogicalORExpression
var ------------------- x ------------ = LogicalANDExpression
var ------------------- x ------------ = BitwiseORExpression
var ------------------- x ------------ = BitwiseXORExpression
var ------------------- x ------------ = BitwiseANDExpression
var ------------------- x ------------ = EqualityExpression
var ------------------- x ------------ = ShiftExpression
var ------------------- x ------------ = AdditiveExpression
var ------------------- x ------------ = MultiplicativeExpression
var ------------------- x ------------ = UnaryExpression
var ------------------- x ------------ = PostfixExpression
var ------------------- x ------------ = NewExpression
var ------------------- x ------------ = MemberExpression
var ------------------- x ------------ = PrimaryExpression
var ------------------- x ------------ = Literal
var ------------------- x ------------ = NumericLiteral
var ------------------- x ------------ = DecimalLiteral
var ------------------- x ------------ = DecimalDigit
var ------------------- x ------------ = 5
Phew! All this had to happen syntactically to parse var x = 5
, sure, a lot of it is handling expressions - but it is what it is, let us check the other version.
This breaks into two statements. var x; x = 5
The first one is:
var x
--------VariableStatement---
var ---- VariableDeclarationList
var ---- VariableDeclaration
var Idenfifier (optional initializer not present)
var x
The second part is x=5
which is an assignment statement. I can go on with the same expression madness - but it's pretty much the same.
So in conclusion, while the two produce the same result semantically, syntactically as the official language grammar specifies - they are different. The result, in this case - is indeed the same.
I don't have time to read the whole article, so I am not hundred percent sure. In my opinion the reason is because the assignment operator in var
statement is a bit special. It doesn't accept all possible left values - no members of an object are allowed (no .
or [
operators). Only plain variable names are allowed.
So we can't use normal assignment
function because it allows all left values.
I am quite sure about this, but the following is just a guess:
We would have to call assignment
function optionally and only after we checked that we consumed the assignment operator.
advance();
if (token.id === "=") {
// OK, Now we know that there is an assignment.
But the function assignment
assumes that current token is a left value, not operator =
.
I have no idea why the assignment
member is not set to true
. It depends on what you want to do with the generated tree. Again, assignment in var
statement is a bit special and it might not be feasible to set it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With