Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between [\s\S]*? and .*?

Tags:

regex

ruby

I've encountered the following token in a regular expression: [\s\S]*?

If I understand this correctly, the character class means "match a whitespace character or a non-whitespace character". Therefore, would this not do exactly the same thing as .*?

One possible difference is that usually . does not match newlines. However, this regular expression was written in Ruby and was passed the m modifier meaning that the . does, in fact, match newlines.

Is there any other reason to use [\s\S]*? instead of .*?

In case it helps, the regular expression I am looking at appears inside the sprockets library in the HEADER_PATTERN constant on line 97. The full expression is:

/
  \A \s* (
    (\/\* ([\s\S]*?) \*\/) |
    (\#\#\# ([\s\S]*?) \#\#\#) |
    (\/\/ ([^\n]*) \n?)+ |
    (\# ([^\n]*) \n?)+
  )
/mx
like image 915
Rupert Madden-Abbott Avatar asked May 29 '11 16:05

Rupert Madden-Abbott


People also ask

What does [\ s mean in regex?

The Difference Between \s and \s+ For example, expression X+ matches one or more X characters. Therefore, the regular expression \s matches a single whitespace character, while \s+ will match one or more whitespace characters.

What is the difference between \S and \s in regex?

The difference between \s and \S is that the former matches all the whitespace while the latter matches all nonwhitespace. Matches involving + are said to be greedy and take as many characters as they can in a given match.

What does \\ S stand for in Java?

\s stands for "whitespace character". It includes [ \t\n\x0B\f\r] .

What is the use of \\ s?

1 Answer. Basically, \\s helps us to match for the single whitespace character whereas, \\s+ helps us to match the sequence of more than one whitespace character. It would be more efficient if you use \\s+.


1 Answers

You interpreted the regex correctly.

That seems like a relict from other languages which do not support the m-flag (or s-flag in other implementations).

A reason to use that construct would be to not use the m-flag so you have the possibility to use . without matching newlines but are still able to match everything if need be.

like image 148
marsbear Avatar answered Sep 24 '22 05:09

marsbear