Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does "?>" mean in a PCRE regex?

Tags:

regex

php

I can't seem to figure out what ?> is used for in a regular expression. For example, the following:

(?>[^()]+)

I know that ?: means that it shouldn't store the match if you're not going to back reference the match. Is this somehow related?

Is this also related to a regular expression? (?P>name) or (?&name)

Source: http://php.net/manual/en/regexp.reference.recursive.php

like image 510
Vivendi Avatar asked Mar 14 '13 15:03

Vivendi


1 Answers

(?>pattern) prevents backtracking on pattern. It has at least 2 names: non-backtracking group, atomic group. I will refer to it as non-backtracking group, since it is the most descriptive name.

The expression (?>[^()]+) alone doesn't need to be made non-backtracking, though. There is nothing that can induce a backtrack to show non-backtracking behavior.

A more interesting example would be the regex ^\((?>[^()]+)\), matching against the string (a + b(), compare to the normal version without the non-backtracking group ^\([^()]+\).

The normal version, after trying (a + b for ^\([^()]+ and fail to match literal ) will backtrack by one character and retry with (a +, etc. until (a, where it fails after exhausting all possibilities.

The non-backtracking version will fail the match right after its first try with (a + b.

Non-backtracking group is mostly useful to reduce backtracking induced by quantifiers (?, *, +, {n,}, {n,m}). The trick to optimization with non-backtracking group is to know the first try made by the regex engine. You may need to shift the regex around to make sure the first try made by the engine is what you want to match - then it can be made non-backtracking.

As an example of optimization with non-backtracking group:

  • How can I improve the performance of a .NET regular expression?

    The question I quote is from .NET, but it uses the same syntax for non-backtracking group.

    In the question above, the original regex has many usages of * and + quantifiers. It induces unnecessary backtracking when the match fails, which affect the performance on large input.

  • StackOverflowError when matching large input using RegEx

    Another example. Note that possessive quantifier (adding + after normal quantifier, e.g. ?+, ++, *+, etc.) and non-backtracking group has the same behavior of non-backtracking, just the syntax of non-backtracking group allows it to be generalized.

    You won't get stack overflow in PHP like in Java, but the performance should be better when you validate a long string.

like image 174
nhahtdh Avatar answered Oct 03 '22 21:10

nhahtdh