I've encountered the following token in a regular expression: [\s\S]*?
If I understand this correctly, the character class means "match a whitespace character or a non-whitespace character". Therefore, would this not do exactly the same thing as .*?
One possible difference is that usually .
does not match newlines. However, this regular expression was written in Ruby and was passed the m
modifier meaning that the .
does, in fact, match newlines.
Is there any other reason to use [\s\S]*?
instead of .*?
In case it helps, the regular expression I am looking at appears inside the sprockets library in the HEADER_PATTERN constant on line 97. The full expression is:
/
\A \s* (
(\/\* ([\s\S]*?) \*\/) |
(\#\#\# ([\s\S]*?) \#\#\#) |
(\/\/ ([^\n]*) \n?)+ |
(\# ([^\n]*) \n?)+
)
/mx
The Difference Between \s and \s+ For example, expression X+ matches one or more X characters. Therefore, the regular expression \s matches a single whitespace character, while \s+ will match one or more whitespace characters.
The difference between \s and \S is that the former matches all the whitespace while the latter matches all nonwhitespace. Matches involving + are said to be greedy and take as many characters as they can in a given match.
\s stands for "whitespace character". It includes [ \t\n\x0B\f\r] .
1 Answer. Basically, \\s helps us to match for the single whitespace character whereas, \\s+ helps us to match the sequence of more than one whitespace character. It would be more efficient if you use \\s+.
You interpreted the regex correctly.
That seems like a relict from other languages which do not support the m-flag (or s-flag in other implementations).
A reason to use that construct would be to not use the m-flag so you have the possibility to use . without matching newlines but are still able to match everything if need be.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With