Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is [\s\S] same as . (dot)?

When we include shorthand for character class and negated-character class in same character class, is it same as dot . which mean any character ?

I did a test on regex101.com and every character matched.

Is [\s\S] [\w\W] and [\d\D] same as . ?

I want to know if this behavior is persistent in web's front and backend languages like Javascript, Php, Python and others.

like image 827
Rahul Avatar asked May 29 '17 15:05

Rahul


People also ask

What does dot match in regex?

In regular expressions, the dot or period is one of the most commonly used metacharacters. Unfortunately, it is also the most commonly misused metacharacter. The dot matches a single character, without caring what that character is. The only exception are line break characters.

What does S mean in regex?

The regular expression \s is a predefined character class. It indicates a single whitespace character. Let's review the set of whitespace characters: [ \t\n\x0B\f\r] The plus sign + is a greedy quantifier, which means one or more times.

How do you make a dot in regex?

That is, this regex shall match the entire input string, instead of a part of the input string (substring). \w+ matches one or more word characters (same as [a-zA-Z0-9_]+ ). \. matches the dot (.)

What is the difference between \S and \s in regex?

\s -- (lowercase s) matches a single whitespace character -- space, newline, return, tab, form [ \n\r\t\f]. \S (upper case S) matches any non-whitespace character. \t, \n, \r -- tab, newline, return. \d -- decimal digit [0-9] (some older regex utilities do not support \d, but they all support \w and \s)


2 Answers

"No" it is not the same. It has an important difference if you are not using the single line flag (meaning that . does not match all).

The [\s\S] comes handy when you want to do mix of matches when the . does not match all.

It is easier to explain it with an example. Suppose you want to capture whatever is between a and b, so you can use pattern a(.*?)b (? is for ungreedy matches and parentheses for capturing the content), but if there are new lines suppose you don't want to capture this in the same group, so you can have another regex like a([\s\S]*?)b.

Therefore if we create one pattern using both approaches it results in:

a(.*)b|a([\s\S]*?)b

enter image description here

In this case, if you see the scenario in regex101, then you will have a colorful and easy way to differentiate the scenarios (in green capturing group #1 and in red capturing group #2): enter image description here

So, in conclusion, the [\s\S] is a regex trick when you want to match multiple lines and the . does not suit your needs. It basically depends on your use case.

However, if you use the single line flag where . matches new lines, then you don't need the regex trick, below you can see that all is green and group 2 (red above) is not matched:enter image description here

Have also created a javascript performance test and it impacts in the performance around 25%:

https://jsperf.com/ss-vs-dot

enter image description here

like image 148
Federico Piazza Avatar answered Oct 03 '22 01:10

Federico Piazza


The answer is: It depends.
If your regex engine does match every character with . then yes, the result is the same. If it doesn't then no, the result is not the same. In standard JavaScript . , for example, does not match line breaks.

like image 32
m00hk00h Avatar answered Oct 03 '22 00:10

m00hk00h