Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between [0-9]+ and [0-9]++?

Tags:

regex

php

Can someone explain me what is the difference between [0-9]+ and [0-9]++?

like image 335
user557108 Avatar asked May 31 '11 11:05

user557108


2 Answers

The PCRE engine, which PHP uses for regular expressions, supports "possessive quantifiers":

Quantifiers followed by + are "possessive". They eat as many characters as possible and don't return to match the rest of the pattern. Thus .*abc matches "aabc" but .*+abc doesn't because .*+ eats the whole string. Possessive quantifiers can be used to speed up processing.

And:

If the PCRE_UNGREEDY option is set (an option which is not available in Perl) then the quantifiers are not greedy by default, but individual ones can be made greedy by following them with a question mark. In other words, it inverts the default behaviour.

The difference is thus:

/[0-9]+/  - one or more digits; greediness defined by the PCRE_UNGREEDY option
/[0-9]+?/ - one or more digits, but as few as possible (non-greedy)
/[0-9]++/ - one or more digits, but as many as possible (greedy, default)

This snippet visualises the difference when in greedy-by-default mode. Note that the first snippet is functionally the same as the last, because the additional + is (in a sense) already applied by default.

This snippet visualises the difference when applying PCRE_UNGREEDY (ungreedy-by-default mode). See how the default is reversed.

like image 80
Lightness Races in Orbit Avatar answered Nov 15 '22 22:11

Lightness Races in Orbit


++ (and ?+, *+ and {n,m}+) are called possessive quantifiers.

Both [0-9]+ and [0-9]++ match one or more ASCII digits, but the second one will not allow the regex engine to backtrack into the match if that should become necessary for the overall regex to succeed.

Example:

[0-9]+0

matches the string 00, whereas [0-9]++0 doesn't.

In the first case, [0-9]+ first matches 00, but then backtracks one character to allow the following 0 to match. In the second case, the ++ prevents this, therefore the entire match fails.

Not all regex flavors support this syntax; some others implement atomic groups instead (or even both).

like image 37
Tim Pietzcker Avatar answered Nov 15 '22 22:11

Tim Pietzcker