When we include shorthand for character class and negated-character class in same character class, is it same as dot . which mean any character ? I did a test on regex101.com and every character matched. Is <code>[\s\S]</code> <code>[\w\W]</code> and <code>[\d\D]</code> same as <code>.</code> ? I want to know if this behavior is persistent in web's front and backend languages like Javascript, Php, Python and others.

"No" it is not the same. It has an important difference if you are not using the <code>single line</code> flag (meaning that <code>.</code> does not match all). The <code>[\s\S]</code> comes handy when you want to do mix of matches when the <code>.</code> does not match all. It is easier to explain it with an example. Suppose you want to capture whatever is between <code>a</code> and <code>b</code>, so you can use pattern <code>a(.*?)b</code> (<code>?</code> is for ungreedy matches and parentheses for capturing the content), but if there are new lines suppose you don't want to capture this in the same group, so you can have another regex like <code>a([\s\S]*?)b</code>. Therefore if we create one pattern using both approaches it results in: <pre class="prettyprint"><code>a(.*)b|a([\s\S]*?)b </code></pre> <img src="https://i.stack.imgur.com/S8R87.png" alt="enter image description here"> In this case, if you see the scenario in regex101, then you will have a colorful and easy way to differentiate the scenarios (in green capturing group #1 and in red capturing group #2): <img src="https://i.stack.imgur.com/L3IkT.png" alt="enter image description here"> So, in conclusion, the <code>[\s\S]</code> is a regex trick when you want to match multiple lines and the <code>.</code> does not suit your needs. It basically depends on your use case. However, if you use the <code>single line</code> flag where <code>.</code> matches new lines, then you don't need the regex trick, below you can see that all is green and group 2 (red above) is not matched:<img src="https://i.stack.imgur.com/LBgjp.png" alt="enter image description here"> Have also created a javascript performance test and it impacts in the performance around 25%: https://jsperf.com/ss-vs-dot <img src="https://i.stack.imgur.com/eJmJM.png" alt="enter image description here">

Is [\s\S] same as . (dot)?

2 Answers

"No" it is not the same. It has an important difference if you are not using the single line flag (meaning that . does not match all).

The [\s\S] comes handy when you want to do mix of matches when the . does not match all.

It is easier to explain it with an example. Suppose you want to capture whatever is between a and b, so you can use pattern a(.*?)b (? is for ungreedy matches and parentheses for capturing the content), but if there are new lines suppose you don't want to capture this in the same group, so you can have another regex like a([\s\S]*?)b.

Therefore if we create one pattern using both approaches it results in:

Click to copy

a(.*)b|a([\s\S]*?)b

enter image description here

In this case, if you see the scenario in regex101, then you will have a colorful and easy way to differentiate the scenarios (in green capturing group #1 and in red capturing group #2): enter image description here

So, in conclusion, the [\s\S] is a regex trick when you want to match multiple lines and the . does not suit your needs. It basically depends on your use case.

However, if you use the single line flag where . matches new lines, then you don't need the regex trick, below you can see that all is green and group 2 (red above) is not matched: enter image description here

Have also created a javascript performance test and it impacts in the performance around 25%:

https://jsperf.com/ss-vs-dot

enter image description here

148

answered Oct 03 '22 01:10

Federico Piazza

The answer is: It depends.
If your regex engine does match every character with . then yes, the result is the same. If it doesn't then no, the result is not the same. In standard JavaScript . , for example, does not match line breaks.

answered Oct 03 '22 00:10

m00hk00h

Related questions
                            
                                Case-insensitive hash-keys in Regexp::Grammars
                            
                                Numeric value directly after backreference [duplicate]
                            
                                Java match whole word in String
                            
                                R tm substitute words in Corpus using gsub
                            
                                How to replace curly braces and its contents in a string
                            
                                Regex returning complete line instead of match
                            
                                Java String replaceAll regex to remove everything except digits, dots and spaces
                            
                                IIS ReWrite Rule to remove query string when contains specific specific query string
                            
                                Match same number of repetitions of character as repetitions of captured group
                            
                                remove single character in string
                            
                                Powershell wildcard / regex replace
                            
                                Regex in Java: match groups until first symbol occurrence
                            
                                Regex get the text after the match which must be the last occurrence
                            
                                Regex match 4 bytes unicode characters
                            
                                How do I capture match-groups of alternation of a regular expression with split?
                            
                                replace multiple spaces by non breaking spaces
                            
                                JAVA - replaceAll in a regex with $1
                            
                                Swift 3 replacingOccurrences regex
                            
                                regex last character of a WORD
                            
                                Java regex match markdown syntax for headings

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is [\s\S] same as . (dot)?

Tags:

regex

regex-negation

character-class

Rahul

People also ask

2 Answers

Federico Piazza

m00hk00h

Recent Activity

Donate For Us