This:
preg_match('~foo(.*?)(bar)?~','foo bar',$m);
gives me this:
Array
(
[0] => foo
[1] =>
)
I'm kinda confused about this. I get that group 1 is giving me an empty string, because it's a lazy match. But shouldn't (bar)?
be greedy and thus give me capture group 2?
Seems reasonable to me that what I should be getting is
Array
(
[0] => foo
[1] =>
[2] => bar
)
where [1]
is a space. And yet.. this is not happening. Why?
The answer here is surprisingly simple. The first group matches nothing (at first pass), not even the space. The second group tries to match the space with "bar", which, of course, fails. If there would be anything behind that that HAS to match, the engine would now backtrack and expand the first capturing group to match the space. But it's perfectly fine the way it is now (the second group actually CAN be emtpy), so it just stays that way.
To produce what you expect, try this:
preg_replace('~foo(.*?)(bar)?_~', 'foo bar_', $m);
No, this behaviour is correct. From the documentation on lazy matching:
if a quantifier is followed by a question mark, then it becomes lazy, and instead matches the minimum number of times possible
Since (bar)?
is optional, (.*?)
does not need to match anything in order for the regular expression to be successful. Since the space between foo and bar was not captured, the expression cannot continue on and match bar.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With