Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I find the largest sequence in a string that is repeated at least once?

Trying to solve the following problem:

Given a string of arbitrary length, find the longest substring that occurs more than one time within the string, with no overlaps.

For example, if the input string was ABCABCAB, the correct output would be ABC. You couldn't say ABCAB, because that only occurs twice where the two substrings overlap, which is not allowed.

Is there any way to solve this reasonably quickly for strings containing a few thousand characters?

(And before anyone asks, this is not homework. I'm looking at ways to optimize the rendering of Lindenmayer fractals, because they tend to take excessive amounts of time to draw at high iteration levels with a naive turtle graphics system.)

like image 831
Mason Wheeler Avatar asked Aug 07 '12 20:08

Mason Wheeler


2 Answers

Here's an example for a string of length 11, which you can generalize

  • Set chunk length to floor(11/2) = 5

  • Scan the string in chunks of 5 characters left to looking for repeats. There will be 3 comparisons

    Left      Right
    Offset    Offset
     0         5
     0         6
     1         5
  • If you found a duplicate you're done. Otherwise reduce the chunk length to 4 and repeat until chunk length goes to zero.

Here's some (obviously untested) pseudocode:

String s
int len = floor(s.length/2)
for int i=len; i>0; i--
    for j=0; j<=len-(2*i); j++
        for k=j+i; k<=len-i; k++
            if s.substr(j,j+i) == s.substr(k,k+i)
                return s.substr(j,j+i)
return null

There may be an off-by-one error in there, but the approach should be sound (and minimal).

like image 119
Jim Garrison Avatar answered Oct 13 '22 18:10

Jim Garrison


it looks like a suffix tree problem. Create the suffix tree, then find the biggest compressed branch with more than one child (occurs more than once in the original string). The number of letters in that compressed branch should be the size of the biggest subsequence.

i found something similar here: http://www.coderanch.com/t/370396/java/java/Algorithm-wanted-longest-repeating-substring

Looks like it can be done in O(n).

like image 45
piotrek Avatar answered Oct 13 '22 18:10

piotrek