When we include shorthand for character class and negated-character class in same character class, is it same as dot . which mean any character ?
I did a test on regex101.com and every character matched.
Is [\s\S]
[\w\W]
and [\d\D]
same as .
?
I want to know if this behavior is persistent in web's front and backend languages like Javascript, Php, Python and others.
In regular expressions, the dot or period is one of the most commonly used metacharacters. Unfortunately, it is also the most commonly misused metacharacter. The dot matches a single character, without caring what that character is. The only exception are line break characters.
The regular expression \s is a predefined character class. It indicates a single whitespace character. Let's review the set of whitespace characters: [ \t\n\x0B\f\r] The plus sign + is a greedy quantifier, which means one or more times.
That is, this regex shall match the entire input string, instead of a part of the input string (substring). \w+ matches one or more word characters (same as [a-zA-Z0-9_]+ ). \. matches the dot (.)
\s -- (lowercase s) matches a single whitespace character -- space, newline, return, tab, form [ \n\r\t\f]. \S (upper case S) matches any non-whitespace character. \t, \n, \r -- tab, newline, return. \d -- decimal digit [0-9] (some older regex utilities do not support \d, but they all support \w and \s)
"No" it is not the same. It has an important difference if you are not using the single line
flag (meaning that .
does not match all).
The [\s\S]
comes handy when you want to do mix of matches when the .
does not match all.
It is easier to explain it with an example. Suppose you want to capture whatever is between a
and b
, so you can use pattern a(.*?)b
(?
is for ungreedy matches and parentheses for capturing the content), but if there are new lines suppose you don't want to capture this in the same group, so you can have another regex like a([\s\S]*?)b
.
Therefore if we create one pattern using both approaches it results in:
a(.*)b|a([\s\S]*?)b
In this case, if you see the scenario in regex101, then you will have a colorful and easy way to differentiate the scenarios (in green capturing group #1 and in red capturing group #2):
So, in conclusion, the [\s\S]
is a regex trick when you want to match multiple lines and the .
does not suit your needs. It basically depends on your use case.
However, if you use the single line
flag where .
matches new lines, then you don't need the regex trick, below you can see that all is green and group 2 (red above) is not matched:
Have also created a javascript performance test and it impacts in the performance around 25%:
https://jsperf.com/ss-vs-dot
The answer is: It depends.
If your regex engine does match every character with .
then yes, the result is the same. If it doesn't then no, the result is not the same. In standard JavaScript .
, for example, does not match line breaks.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With