Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

preg_match_all how to get *all* combinations? Even overlapping ones

Tags:

regex

php

Is there a way in the PHP regex functions to get all possible matches of a regex even if those matches overlap?

e.g. Get all the 3 digit substrings '/[\d]{3}/'...

You might expect to get:

"123456" => ['123', '234', '345', '456']

But preg_match_all() only returns

['123', '456']

This is because it begins searching again after the matched substring (as noted in the documentation):

"After the first match is found, the subsequent searches are continued on from end of the last match.".

Is there a way around this without writing a custom parser?

like image 541
Jagu Avatar asked Mar 17 '14 12:03

Jagu


3 Answers

Look-ahead assertions to the rescue!

preg_match_all('/(?=(\d{3}))/', $str, $matches);
print_r($matches[1]);

It basically captures whatever the look-ahead assertion is matching. Since the assertion is zero width, $matches[0] will only contain empty strings, but $matches[1] will contain the expected captured patterns.

like image 197
Ja͢ck Avatar answered Nov 03 '22 03:11

Ja͢ck


This may not be ideal, but at least it's something.

It looks like you could use a positive lookahead and PREG_OFFSET_CAPTURE to get all the string indexes for where a 3-digit number exists

$str = "123456";

preg_match_all("/\d(?=\d{2})/", $str, $matches, PREG_OFFSET_CAPTURE);

$numbers = array_map(function($m) use($str){
  return substr($str, $m[1], 3);
}, $matches[0]);

print_r($numbers);

Output

Array
(
    [0] => 123
    [1] => 234
    [2] => 345
    [3] => 456
)
like image 26
maček Avatar answered Nov 03 '22 04:11

maček


With \K inside a lookbehind:

preg_match_all('~(?<=\K..).~', '123456', $m);
print_r($m[0]);

demo

Only one character is consumed (the third), the first two are not since they are inside a lookbehind that is a zero-width assertion. But the \K gives the start of the match result and the first two are returned (with the third).

Notice: You can't put all the three characters in the lookbehind and write (?<=\K...), because in this case the regex engine will stay forever at the same position in the string.

like image 24
Casimir et Hippolyte Avatar answered Nov 03 '22 03:11

Casimir et Hippolyte