Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get all possible overlapping matches for a string

I'm working on the MIU system problem from "Gödel, Escher, Bach" chapter 2.

One of the rules states

Rule III: If III occurs in one of the strings in your collection, you may make a new string with U in place of III.

Which means that the string MIII can become MU, but for other, longer strings there may be multiple possibilities [matches in brackets]:

  • MIIII could yield
    • M[III]I >> MUI
    • MI[III] >> MIU
  • MUIIIUIIIU could yield
    • MU[III]UIIIU >> MUUUIIIU
    • MUIIIU[III]U >> MUIIIUUU
  • MUIIIIU could yield
    • MU[III]IU >> MUUIU
    • MUI[III]U >> MUIUU

Clearly regular expressions such as /(.*)III(.*)/ are helpful, but I can't seem to get them to generate every possible match, just the first one it happens to find.

Is there a way to generate every possible match?

(Note, I can think of ways to do this entirely manually, but I am hoping there is a better way using the built in tools, regex or otherwise)

(Edited to clarify overlapping needs.)

like image 558
Colin Dabritz Avatar asked Aug 03 '13 05:08

Colin Dabritz


2 Answers

Here's the regex you need: /III/g - simple enough, right? Now here's how you use it:

var text = "MUIIIUIIIU", find = "III", replace "U",
    regex = new RegExp(find,"g"), matches = [], match;
while(match = regex.exec(text)) {
    matches.push(match);
    regex.lastIndex = match.index+1;
}

That regex.lastIndex... line overrides the usual regex behaviour of not matching results that overap. Also I'm using a RegExp constructor to make this more flexible. You could even build it into a function this way.

Now you have an array of match objects, you can do this:

matches.forEach(function(m) { // older browsers need a shim or old-fashioned for loop
    console.log(text.substr(0,m.index)+replace+text.substr(m.index+find.length));
});

EDIT: Here is a JSFiddle demonstrating the above code.

like image 154
Niet the Dark Absol Avatar answered Nov 06 '22 18:11

Niet the Dark Absol


Sometimes regexes are overkill. In your case a simple indexOf might be fine too!

Here is, admittedly, a hack, but you can transform it into pretty, reusable code on your own:

var s = "MIIIIIUIUIIIUUIIUIIIIIU";
var results = [];
for (var i = 0; true; i += 1) {
    i = s.indexOf("III", i);
    if (i === -1) {
        break;
    }
    results.push(i);
}
console.log("Match positions: " + JSON.stringify(results));

It takes care of overlaps just fine, and at least to me, the indexOf just looks simpler.

like image 29
Ray Toal Avatar answered Nov 06 '22 19:11

Ray Toal