Ever since my awful experience with cssparser, I have set myself the task of implementing a CSS parser in Java using Parboiled. I already have all color specification covered, but of course I need all the rest...
So, I went to look for the CSS specification and found it on the W3C website. I am now in the process of writing rules for all "atoms", but found something disturbing in this section:
UNICODE-RANGE u\+[0-9a-f?]{1,6}(-[0-9a-f]{1,6})?
The part that disturbs me is the question mark in [0-9a-f?]
.
The paragraph heading says the regular expressions used here are Lex-style. The ?
has no special meaning in a character class (thanks @scizzo for the confirmation). So, is this a typo in the W3C specification, or is ?
really allowed in a Unicode range? If yes, what does it mean?
Wrap up: I have my answer. However, the specification is wrong: a "question mark unicode range" can only be by itself. Given the above regex, this expression would be allowed whereas it is clearly illegal: u+4??-733f
Yup, that's a literal question mark. From the Flex documentation:
Note that inside of a character class, all regular expression operators lose their special meaning except escape ('\') and the character class operators, '-', ']', and, at the beginning of the class, '^'.
Now, according to the W3C, ?
can be used as a kind of wildcard:
?
characters imply 'any digit value' (e.g. U+4??)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With