Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex atomic grouping does not seem to work in preg_match_all()

Tags:

regex

php

I've recently been playing with regular expressions, and one thing doesn't work as expected for me when I use preg_match_all in php.

I'm using an online regex tool at http://www.solmetra.com/scripts/regex/index.php.

The regex I'm using is /(?>x|y|z)w/. I'm making it match abyxw. I am expecting it to fail, yet it succeeds, and matches xw.

I am expecting it to fail, due to the use of atomic grouping, which, from what I have read from multiple sources, prevents backtracking. What I am expecting precisely is that the engine attempts to match the y with alternation and succeeds. Later it attempts to match w with the regex literal w and fails, because it encounters x. Then it would normally backtrack, but it shouldn't in this case, due to the atomic grouping. So from what I know it should keep trying to match y with this atomic group. Yet it does not.

I would appreciate any light shed on this situation. :)

like image 732
Neob91 Avatar asked Oct 05 '22 19:10

Neob91


1 Answers

This is a little bit tricky, but there are two things that the regex can try to do when it cannot find a match:

  • Advance the starting position - If the match cannot succeed at an index i, it will be attempted again starting at index i+1, and this will continue until it reaches the end of the string.
  • Backtracking - If repetition or alternation is used in the regex, then the regex engine can discard part of an unsuccessful match and try again by using less or more of the repetition, or a different element in the alternation.

Atomic groups prevent backtracking but they do not affect advancing the starting position.

In this case, the match will fail when the engine is trying to match with y as the first character, but then it will move on and see xw as the remainder of the string, which will match.

like image 126
Andrew Clark Avatar answered Oct 12 '22 23:10

Andrew Clark