Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find longest repeating substring in JavaScript using regular expressions

I'd like to find the longest repeating string within a string, implemented in JavaScript and using a regular-expression based approach.

I have an PHP implementation that, when directly ported to JavaScript, doesn't work.

The PHP implementation is taken from an answer to the question "Find longest repeating strings?":

preg_match_all('/(?=((.+)(?:.*?\2)+))/s', $input, $matches, PREG_SET_ORDER);

This will populate $matches[0][X] (where X is the length of $matches[0]) with the longest repeating substring to be found in $input. I have tested this with many input strings and found am confident the output is correct.

The closest direct port in JavaScript is:

var matches = /(?=((.+)(?:.*?\2)+))/.exec(input);

This doesn't give correct results

input                  Excepted result   matches[0][X]
======================================================
inputinput             input             input
7inputinput            input             input
inputinput7            input             input
7inputinput7           input             7
XXinputinputYY         input             XX

I'm not familiar enough with regular expressions to understand what the regular expression used here is doing.

There are certainly algorithms I could implement to find the longest repeating substring. Before I attempt to do that, I'm hoping a different regular expression will produce the correct results in JavaScript.

Can the above regular expression be modified such that the expected output is returned in JavaScript? I accept that this may not be possible in a one-liner.

like image 697
Jon Cram Avatar asked Oct 09 '10 22:10

Jon Cram


People also ask

How do you find the longest repeated substring in a string?

The maximum value of LCSRe(i, j) provides the length of the longest repeating substring and the substring itself can be found using the length and the ending index of the common suffix.

How do you find the longest repeating substring in C++?

Longest Repeating Substring in C++ Suppose we have a string S, we have to find the length of the longest repeating substring(s). We will return 0 if no repeating substring is present. So if the string is like “abbaba”, then the output will be 2. As the longest repeating substring is “ab” or “ba”.


1 Answers

Javascript matches only return the first match -- you have to loop in order to find multiple results. A little testing shows this gets the expected results:

function maxRepeat(input) {
 var reg = /(?=((.+)(?:.*?\2)+))/g;
 var sub = ""; //somewhere to stick temp results
 var maxstr = ""; // our maximum length repeated string
 reg.lastIndex = 0; // because reg previously existed, we may need to reset this
 sub = reg.exec(input); // find the first repeated string
 while (!(sub == null)){
  if ((!(sub == null)) && (sub[2].length > maxstr.length)){
   maxstr = sub[2];
  }
  sub = reg.exec(input);
  reg.lastIndex++; // start searching from the next position
 }
 return maxstr;
}

// I'm logging to console for convenience
console.log(maxRepeat("aabcd"));             //aa
console.log(maxRepeat("inputinput"));        //input
console.log(maxRepeat("7inputinput"));       //input
console.log(maxRepeat("inputinput7"));       //input
console.log(maxRepeat("7inputinput7"));      //input
console.log(maxRepeat("xxabcdyy"));          //x
console.log(maxRepeat("XXinputinputYY"));    //input

Note that for "xxabcdyy" you only get "x" back, as it returns the first string of maximum length.

like image 94
Ben Doom Avatar answered Oct 17 '22 08:10

Ben Doom