Is there a way in the PHP regex functions to get all possible matches of a regex even if those matches overlap?
e.g. Get all the 3 digit substrings '/[\d]{3}/'...
You might expect to get:
"123456" => ['123', '234', '345', '456']
But preg_match_all() only returns
['123', '456']
This is because it begins searching again after the matched substring (as noted in the documentation):
"After the first match is found, the subsequent searches are continued on from end of the last match.".
Is there a way around this without writing a custom parser?
Look-ahead assertions to the rescue!
preg_match_all('/(?=(\d{3}))/', $str, $matches);
print_r($matches[1]);
It basically captures whatever the look-ahead assertion is matching. Since the assertion is zero width, $matches[0]
will only contain empty strings, but $matches[1]
will contain the expected captured patterns.
This may not be ideal, but at least it's something.
It looks like you could use a positive lookahead and PREG_OFFSET_CAPTURE
to get all the string indexes for where a 3-digit number exists
$str = "123456";
preg_match_all("/\d(?=\d{2})/", $str, $matches, PREG_OFFSET_CAPTURE);
$numbers = array_map(function($m) use($str){
return substr($str, $m[1], 3);
}, $matches[0]);
print_r($numbers);
Output
Array
(
[0] => 123
[1] => 234
[2] => 345
[3] => 456
)
With \K
inside a lookbehind:
preg_match_all('~(?<=\K..).~', '123456', $m);
print_r($m[0]);
demo
Only one character is consumed (the third), the first two are not since they are inside a lookbehind that is a zero-width assertion. But the \K
gives the start of the match result and the first two are returned (with the third).
Notice: You can't put all the three characters in the lookbehind and write (?<=\K...)
, because in this case the regex engine will stay forever at the same position in the string.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With