I need to match <code>@anything_here@</code> from a string <code>@anything_here@dhhhd@shdjhjs@</code>. So I'd used following regex. <pre class="prettyprint"><code>^@.*?@ </code></pre> or <pre class="prettyprint"><code>^@[^@]*@ </code></pre> Both way it's work but I would like to know which one would be a better solution. Regex with non-greedy repetition or regex with negated character class?

It is clear the <code>^@[^@]*@</code> option is much better. The negated character class is quantified greedily which means the regex engine grabs 0 or more chars other than <code>@</code> right away, as many as possible. See this regex demo and matching: <img src="https://i.stack.imgur.com/a2Er0.png" alt="enter image description here"> When you use a lazy dot matching pattern, the engine matches <code>@</code>, then tries to match the trailing <code>@</code> (skipping the <code>.*?</code>). It does not find the <code>@</code> at Index 1, so the <code>.*?</code> matches the <code>a</code> char. This <code>.*?</code> pattern expands as many times as there are chars other than <code>@</code> up to the first <code>@</code>. See the lazy dot matching based pattern demo here and here is the matching steps: <img src="https://i.stack.imgur.com/7bVL7.png" alt="enter image description here">

Negated character classes should usually be prefered over lazy matching, if possible. If the regex is successful, <code>^@[^@]*@</code> can match the content between <code>@</code>s in a single step, while <code>^@.*?@</code> needs to expand for each character between <code>@</code>s. When failing (for the case of no ending <code>@</code>) most regex engines will apply a little magic and internally treat <code>[^@]*</code> as <code>[^@]*+</code>, as there is a clear cut border between <code>@</code> and non-<code>@</code>, thus it will match to the end of the string, recognize the missing <code>@</code> and not backtrack, but instantly fail. <code>.*?</code> will expand character for character as usual. When used in larger contexts, <code>[^@]*</code> will also never expand over the borders of the ending <code>@</code> while this is very well possible for the lazy matching. E.g. <code>^@[^@]*a[^@]*@</code> won't match <code>@bbbb@a@</code> while <code>^@.*?a.*?@</code> will. Note that <code>[^@]</code> will also match newlines, while <code>.</code> doesn't (in most regex engines and unless used in singleline mode). You can avoid this by adding the newline character to the negation - if it is not wanted.

Which would be better non-greedy regex or negated character class?

Tags:

string

regex

regex-negation

non-greedy

I need to match @anything_here@ from a string @anything_here@dhhhd@shdjhjs@. So I'd used following regex.

^@.*?@

^@[^@]*@

Both way it's work but I would like to know which one would be a better solution. Regex with non-greedy repetition or regex with negated character class?

537

asked Dec 21 '16 18:12

Pranav C Balan

2 Answers

It is clear the ^@[^@]*@ option is much better.

The negated character class is quantified greedily which means the regex engine grabs 0 or more chars other than @ right away, as many as possible. See this regex demo and matching:

enter image description here

When you use a lazy dot matching pattern, the engine matches @, then tries to match the trailing @ (skipping the .*?). It does not find the @ at Index 1, so the .*? matches the a char. This .*? pattern expands as many times as there are chars other than @ up to the first @.

See the lazy dot matching based pattern demo here and here is the matching steps:

enter image description here

141

answered Oct 19 '22 11:10

Wiktor Stribiżew

Negated character classes should usually be prefered over lazy matching, if possible.

If the regex is successful, ^@[^@]*@ can match the content between @s in a single step, while ^@.*?@ needs to expand for each character between @s.

When failing (for the case of no ending @) most regex engines will apply a little magic and internally treat [^@]* as [^@]*+, as there is a clear cut border between @ and non-@, thus it will match to the end of the string, recognize the missing @ and not backtrack, but instantly fail. .*? will expand character for character as usual.

When used in larger contexts, [^@]* will also never expand over the borders of the ending @ while this is very well possible for the lazy matching. E.g. ^@[^@]*a[^@]*@ won't match @bbbb@a@ while ^@.*?a.*?@ will.

Note that [^@] will also match newlines, while . doesn't (in most regex engines and unless used in singleline mode). You can avoid this by adding the newline character to the negation - if it is not wanted.

answered Oct 19 '22 11:10

Sebastian Proske

Related questions
                            
                                Sed error "\1 not defined in the RE" on MacOSX 10.9.5
                            
                                What is the xpath regex to extract this meta tag?
                            
                                R: (*SKIP)(*FAIL) for multiple patterns
                            
                                find words of length 4 using regular expression
                            
                                python RE findall() return value is an entire string
                            
                                Creating regex to extract 4 digit number from string using java
                            
                                How can I convert Degree minute sec to Decimal in R?
                            
                                Switch/case statement
                            
                                Microsoft Edge regex for user agent
                            
                                List files on HTTP/FTP server in R
                            
                                Regex match from start label until empty line or end label
                            
                                Can I improve performance of this regular expression further
                            
                                Replace string containing $& in JavaScript regex
                            
                                Regex 4 non consecutive and no repeated digits
                            
                                Issues with ESLint "max-len" ignore pattern
                            
                                Pandas - filter and regex search the index of DataFrame
                            
                                Regex to add leading zero in date record
                            
                                Regex Match a character which is not followed by another specific character
                            
                                Regex capture order: wrong alternative matched after greedy pattern
                            
                                Java - regex for ordinary positive negative number

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With