Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

php undertsanding greedy vs. nongreedy matching

Tags:

regex

php

This:

preg_match('~foo(.*?)(bar)?~','foo bar',$m);

gives me this:

Array
(
    [0] => foo
    [1] => 
)

I'm kinda confused about this. I get that group 1 is giving me an empty string, because it's a lazy match. But shouldn't (bar)? be greedy and thus give me capture group 2?

Seems reasonable to me that what I should be getting is

Array
(
    [0] => foo
    [1] => 
    [2] => bar
)

where [1] is a space. And yet.. this is not happening. Why?

like image 965
slinkhi Avatar asked Nov 03 '13 21:11

slinkhi


2 Answers

The answer here is surprisingly simple. The first group matches nothing (at first pass), not even the space. The second group tries to match the space with "bar", which, of course, fails. If there would be anything behind that that HAS to match, the engine would now backtrack and expand the first capturing group to match the space. But it's perfectly fine the way it is now (the second group actually CAN be emtpy), so it just stays that way.

To produce what you expect, try this:

preg_replace('~foo(.*?)(bar)?_~', 'foo bar_', $m);


In your edit, you added another capturing group. (.*) is now 2. It matches till the end of the string, as you thought it would. So you're right on that one, you just changed the example ^^
like image 68
Johannes H. Avatar answered Sep 21 '22 10:09

Johannes H.


No, this behaviour is correct. From the documentation on lazy matching:

if a quantifier is followed by a question mark, then it becomes lazy, and instead matches the minimum number of times possible

Since (bar)? is optional, (.*?) does not need to match anything in order for the regular expression to be successful. Since the space between foo and bar was not captured, the expression cannot continue on and match bar.

like image 32
Tim Cooper Avatar answered Sep 22 '22 10:09

Tim Cooper