I found this regex code that finds comments in w3.org's CSS grammar page.
\/\*[^*]*\*+([^/*][^*]*\*+)*\/
It is quite long and bit difficult to understand. I'd just put
\/\*.*\*\/
to find comments, but when I tested it in RegexPal it finds single line comments and not multi-line comments whereas the original regex can find all types of comments.
I don't understand what the
+([^/*][^*]*\*+)*
part inside the original regex does. Can anyone explain me this?
Token by token explanation:
\/ <- an escaped '/', matches '/'
\* <- an escaped '*', matches '*'
[^*]* <- a negated character class with quantifier, matches anything but '*' zero or more times
\*+ <- an escaped '*' with quantifier, matches '*' once or more
( <- beginning of group
[^/*] <- negated character class, matches anything but '/' or '*' once
[^*]* <- negated character class with quantifier, matches anything but '*' zero or more times
\*+ <- escaped '*' with quantifier, matches '*' once or more
)* <- end of group with quantifier, matches group zero or more times
\/ <- an escaped '/', matches '/'
Regex Reference
Analysis on Regexper.com
The reason yours finds only single line comments is that, in typical regular expressions, .
matches anything except newlines; whereas the other one uses a negated character class which matches anything but the specified characters, and so can match newlines.
However, if you were to fix that (there's usually an option for multiline or "as if single line" matching), you would find that it would match from the /*
of the first comment to the */
of the last comment; you would have to use a non-greedy quantifier, .*?
, to match no more than one comment.
However, the more complex regular expression you give is even more complex than that. Based on nikc.org's answer, I believe it is to enforce the restriction that “comments may not be nested”; that is, they must not contain /*
within them. In other languages which permit comments /* like /* this */
(that is, an internal /* is neither prohibited nor a nested comment), the pattern \/\*.*?\*\/
would be appropriate to match them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With