Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex matches repeating group {0,2} or {0,4} but {0,3} doesn't

Tags:

regex

pcre

first, this is using preg.

String I'm trying to match:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa b c d xp

My regex and their matches:

(\S*\s*){0,1}\S*p = "d xp"
(\S*\s*){0,2}\S*p = "c d xp"
(\S*\s*){0,3}\S*p = NO MATCH (expecting "b c d xp"
(\S*\s*){0,4}\S*p = entire string
(\S*\s*){0,5}\S*p = entire string

Oddly, if I remove a single "a" it works. Also, (\S*\s*){0,3}\Sp or (\S*\s){0,3}\S*p both work.

Can someone explain why the third case results in no matches instead of "b c d xp"?

TIA!

like image 325
robgmills Avatar asked Jan 24 '23 01:01

robgmills


1 Answers

Good question.

I tried another language that also has Perl RE syntax, Ruby, and it returned the expected string:

$ irb
>> s='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa b c d xp'
=> "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa b c d xp"
>> s[/(\S*\s*){0,3}\S*p/]
=> "b c d xp"

This made me think you found an interpreter bug...

But we now know that

  • Your RE was correct, as was your expectation of its results
  • PHP has a limit on backtracks, and the problem was your expression hit the limit. Ruby just doesn't check, or has a different limit.
like image 56
DigitalRoss Avatar answered Feb 12 '23 01:02

DigitalRoss