For the common problem of matching text between delimiters (e.g. <
and >
), there's two common patterns:
*
or +
quantifier in the form START [^END]* END
, e.g. <[^>]*>
, or*?
or +?
quantifier in the form START .*? END
, e.g. <.*?>
.Is there a particular reason to favour one over the other?
'Greedy' means match longest possible string. 'Lazy' means match shortest possible string.
You make it non-greedy by using ". *?" When using the latter construct, the regex engine will, at every step it matches text into the "." attempt to match whatever make come after the ". *?" . This means that if for instance nothing comes after the ".
It means the greedy quantifiers will match their preceding elements as much as possible to return to the biggest match possible. On the other hand, the non-greedy quantifiers will match as little as possible to return the smallest match possible. non-greedy quantifiers are the opposite of greedy ones.
So the difference between the greedy and the non-greedy match is the following: The greedy match will try to match as many repetitions of the quantified pattern as possible. The non-greedy match will try to match as few repetitions of the quantified pattern as possible.
Some advantages:
[^>]*
:
/s
flag.[^>]
the engine doesn't make choices - we give it only one way to match the pattern against the string)..*?
(?:(?!END).)*
. This is even worse if the END delimiter is another pattern. The first is more explicit, i. e. it definitely excludes the closing delimiter from being part of the matched text. This is not guaranteed in the second case (if the regular expression is extended to match more than just this tag).
Example: If you try to match <tag1><tag2>Hello!
with <.*?>Hello!
, the regex will match
<tag1><tag2>Hello!
whereas <[^>]*>Hello!
will match
<tag2>Hello!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With