I need to match @anything_here@
from a string @anything_here@dhhhd@shdjhjs@
. So I'd used following regex.
^@.*?@
or
^@[^@]*@
Both way it's work but I would like to know which one would be a better solution. Regex with non-greedy repetition or regex with negated character class?
You can make the default quantifiers ? , * , + , {m} , and {m,n} non-greedy by appending a question mark symbol '?' to them: ?? , *? , +? , and {m,n}? . they “consume” or match as few characters as possible so that the regex pattern is still satisfied.
The standard quantifiers in regular expressions are greedy, meaning they match as much as they can, only giving back as necessary to match the remainder of the regex. By using a lazy quantifier, the expression tries the minimal match first.
In general, the regex engine will try to match as many input characters as possible once it encounters a quantified token like \d+ or, in our case, . * . That behavior is called greedy matching because the engine will eagerly attempt to match anything it can.
If you've ever found yourself pulling your hair out trying to build the perfect regular expression to match the least amount of data possible, then non-greedy Perl regex are what you need. By default, Perl regular expressions are greedy, meaning they will match as much data as possible before a new line.
It is clear the ^@[^@]*@
option is much better.
The negated character class is quantified greedily which means the regex engine grabs 0 or more chars other than @
right away, as many as possible. See this regex demo and matching:
When you use a lazy dot matching pattern, the engine matches @
, then tries to match the trailing @
(skipping the .*?
). It does not find the @
at Index 1, so the .*?
matches the a
char. This .*?
pattern expands as many times as there are chars other than @
up to the first @
.
See the lazy dot matching based pattern demo here and here is the matching steps:
Negated character classes should usually be prefered over lazy matching, if possible.
If the regex is successful, ^@[^@]*@
can match the content between @
s in a single step, while ^@.*?@
needs to expand for each character between @
s.
When failing (for the case of no ending @
) most regex engines will apply a little magic and internally treat [^@]*
as [^@]*+
, as there is a clear cut border between @
and non-@
, thus it will match to the end of the string, recognize the missing @
and not backtrack, but instantly fail. .*?
will expand character for character as usual.
When used in larger contexts, [^@]*
will also never expand over the borders of the ending @
while this is very well possible for the lazy matching. E.g. ^@[^@]*a[^@]*@
won't match @bbbb@a@
while ^@.*?a.*?@
will.
Note that [^@]
will also match newlines, while .
doesn't (in most regex engines and unless used in singleline mode). You can avoid this by adding the newline character to the negation - if it is not wanted.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With