Can someone explain me what is the difference between [0-9]+
and [0-9]++
?
The PCRE engine, which PHP uses for regular expressions, supports "possessive quantifiers":
Quantifiers followed by
+
are "possessive". They eat as many characters as possible and don't return to match the rest of the pattern. Thus.*abc
matches"aabc"
but.*+abc
doesn't because.*+
eats the whole string. Possessive quantifiers can be used to speed up processing.
And:
If the PCRE_UNGREEDY option is set (an option which is not available in Perl) then the quantifiers are not greedy by default, but individual ones can be made greedy by following them with a question mark. In other words, it inverts the default behaviour.
The difference is thus:
/[0-9]+/ - one or more digits; greediness defined by the PCRE_UNGREEDY option
/[0-9]+?/ - one or more digits, but as few as possible (non-greedy)
/[0-9]++/ - one or more digits, but as many as possible (greedy, default)
This snippet visualises the difference when in greedy-by-default mode. Note that the first snippet is functionally the same as the last, because the additional +
is (in a sense) already applied by default.
This snippet visualises the difference when applying PCRE_UNGREEDY (ungreedy-by-default mode). See how the default is reversed.
++
(and ?+
, *+
and {n,m}+
) are called possessive quantifiers.
Both [0-9]+
and [0-9]++
match one or more ASCII digits, but the second one will not allow the regex engine to backtrack into the match if that should become necessary for the overall regex to succeed.
Example:
[0-9]+0
matches the string 00
, whereas [0-9]++0
doesn't.
In the first case, [0-9]+
first matches 00
, but then backtracks one character to allow the following 0
to match. In the second case, the ++
prevents this, therefore the entire match fails.
Not all regex flavors support this syntax; some others implement atomic groups instead (or even both).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With