Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does the ? mean in the tokenization section of the W3C CSS specification?

Ever since my awful experience with cssparser, I have set myself the task of implementing a CSS parser in Java using Parboiled. I already have all color specification covered, but of course I need all the rest...

So, I went to look for the CSS specification and found it on the W3C website. I am now in the process of writing rules for all "atoms", but found something disturbing in this section:

UNICODE-RANGE   u\+[0-9a-f?]{1,6}(-[0-9a-f]{1,6})?

The part that disturbs me is the question mark in [0-9a-f?].

The paragraph heading says the regular expressions used here are Lex-style. The ? has no special meaning in a character class (thanks @scizzo for the confirmation). So, is this a typo in the W3C specification, or is ? really allowed in a Unicode range? If yes, what does it mean?

Wrap up: I have my answer. However, the specification is wrong: a "question mark unicode range" can only be by itself. Given the above regex, this expression would be allowed whereas it is clearly illegal: u+4??-733f

like image 573
fge Avatar asked Dec 28 '11 02:12

fge


1 Answers

Yup, that's a literal question mark. From the Flex documentation:

Note that inside of a character class, all regular expression operators lose their special meaning except escape ('\') and the character class operators, '-', ']', and, at the beginning of the class, '^'.

Now, according to the W3C, ? can be used as a kind of wildcard:

? characters imply 'any digit value' (e.g. U+4??)

like image 106
sczizzo Avatar answered Sep 20 '22 07:09

sczizzo