Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Explain regex that finds CSS comments

Tags:

regex

css

I found this regex code that finds comments in w3.org's CSS grammar page.

\/\*[^*]*\*+([^/*][^*]*\*+)*\/

It is quite long and bit difficult to understand. I'd just put

\/\*.*\*\/

to find comments, but when I tested it in RegexPal it finds single line comments and not multi-line comments whereas the original regex can find all types of comments.

I don't understand what the

+([^/*][^*]*\*+)*

part inside the original regex does. Can anyone explain me this?

like image 348
Vigneshwaran Avatar asked Feb 17 '12 13:02

Vigneshwaran


2 Answers

Token by token explanation:

\/    <- an escaped '/', matches '/'
\*    <- an escaped '*', matches '*'
[^*]* <- a negated character class with quantifier, matches anything but '*' zero or more times
\*+   <- an escaped '*' with quantifier, matches '*' once or more
(     <- beginning of group 
[^/*] <- negated character class, matches anything but '/' or '*' once
[^*]* <- negated character class with quantifier, matches anything but '*' zero or more times
\*+   <- escaped '*' with quantifier, matches '*' once or more
)*    <- end of group with quantifier, matches group zero or more times
\/    <- an escaped '/', matches '/'

Regex Reference

Analysis on Regexper.com

like image 58
nikc.org Avatar answered Nov 08 '22 18:11

nikc.org


The reason yours finds only single line comments is that, in typical regular expressions, . matches anything except newlines; whereas the other one uses a negated character class which matches anything but the specified characters, and so can match newlines.

However, if you were to fix that (there's usually an option for multiline or "as if single line" matching), you would find that it would match from the /* of the first comment to the */ of the last comment; you would have to use a non-greedy quantifier, .*?, to match no more than one comment.

However, the more complex regular expression you give is even more complex than that. Based on nikc.org's answer, I believe it is to enforce the restriction that “comments may not be nested”; that is, they must not contain /* within them. In other languages which permit comments /* like /* this */ (that is, an internal /* is neither prohibited nor a nested comment), the pattern \/\*.*?\*\/ would be appropriate to match them.

like image 24
Kevin Reid Avatar answered Nov 08 '22 19:11

Kevin Reid