I was having a look at the CSS syntax here and here and I was amazed to see both the token productions and the grammar littered with whitespace declarations. Normally whitespace is defined once in the lexer and skipped, never to be seen again. Ditto comments.
I imagine the orientation towards user-agents rather than true compilers is part of the motivation here, and also the requirement to proceed in the face of errors, but it still seems pretty odd.
Are real-life UAs that parse CSS really implemented according to this (these) grammars?
EDIT: reason for the question is actually the various LESS implementations. less.js
doesn't understand consecutive comments, and lessc.exe
doesn't understand comments inside selectors. In this respect they are not even able to parse CSS correctly, however that is defined. So I went to see what the actual grammar of CSS was and ...
CSS, while similar to many programming languages, does have some rare instances where whitespace can be important.
Say we have the following base markup:
<html>
<head>
<style type="text/css">
.blueborder { width:200px; height:200px; border: solid 2px #00f; }
.redborder { width:100px; height:100px; margin:50px; border: solid 2px #f00; }
</style>
</head>
<body>
<div class="blueborder">
<div class="redborder"></div>
</div>
</body>
</html>
There's nothing special here, except a div inside of a div, with some styles on it so that you can see the difference between them.
Now lets add another class and an ID to the outer div:
<div class="blueborder MyClass" id="MyDiv">
<div class="redborder"></div>
</div>
If I want to give a background to the outer div in the following manner:
.MyClass#MyDiv { background: #ccc; }
...then the whitespace becomes important. The rule above does style the outer div, as it is parsed differently than the following:
.MyClass #MyDiv { background: #ccc; }
...which does NOT style the outer div.
To see how these are parsed differently, you can look at how the selectors are tokenized:
Example1:
.MyClass#MyDiv -> DELIM IDENT HASH
Example2:
.MyClass #MyDiv -> DELIM IDENT S HASH
If we were to blindly ignore whitespace (as compilers usually do), we would miss this difference.
With that being said, I am not implying that this grammer is good. I also have a fair amount of experience in writing grammars, and I cringe when looking at this grammar. The easiest solution would have been to add the #
and .
symbols into the IDENT token and then everything else becomes much easier.
However they did not chose to do this, and the need for whitespace is an artifact of that decision.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With