Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Overlapping matches with preg_match_all and pattern ending with repeated character

Tags:

regex

php

I'd like to do something similar to question preg_match_all how to get *all* combinations? Even overlapping ones and find all matches for a given pattern even when they overlap (e.g. matching string ABABA with pattern ABA should return 2 matches, not just the first one).

But I have an additional constraint: my pattern can end with a repetition specifier. Let's use + as an example: this means pattern /A+/ and subject "AA" should return 3 matches:

  • Match "AA" starting at index 0
  • Match "A" starting at index 1
  • Match "A" starting at index 0

Following patterns, based on the solution suggested to the question above, fail to match all 3 results:

  • Pattern /(?=(A+))/ finds only the first 2 matches but not the last one
  • Pattern /(?=(A+?))/ finds only the last 2 matches but not the first one

My only workaround for now is to keep the greedy version and try to apply pattern against each match minus its last character, repeating this operation until it doesn't match anymore, e.g.:

$all_matches = array ();
$pattern = 'A+';

preg_match_all("/(?=($pattern))/", "AA", $matches, PREG_SET_ORDER);

foreach ($matches as $match) {
    do {
        $all_matches[] = $match[1];
        $subject = substr($match[1], 0, -1);
    }
    while (preg_match("/^($pattern)/", $subject, $match));
}

Is there any better solution to achieve this using preg_match_all or similar?

like image 952
r3c Avatar asked Jan 17 '26 05:01

r3c


1 Answers

You want to get several matches at one index, which is impossible with 1 regex matching operation. You actually need to

  • Find all combination of substrings from your string and
  • Only keep those that fully match your pattern.

See the PHP demo:

function find_substrings($r, $s) {
  $res = array();
  $cur = "";
  $r = '~^' . $r . '$~';
  for ($q = 0; $q < strlen($s); ++$q) {
    for ($w = $q; $w <= strlen($s); ++$w) {
        $cur = substr($s, $q, $w-$q);
        if (preg_match($r, $cur)) {
            array_push($res, $cur);
        }
    }
  }
  return $res;
}
print_r(find_substrings("ABA", "ABABA"));
// => Array ( [0] => ABA [1] => ABA )
print_r(find_substrings("A+", "AA"));
// => Array ( [0] => A [1] => AA [2] => A )
like image 139
Wiktor Stribiżew Avatar answered Jan 19 '26 18:01

Wiktor Stribiżew



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!