Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simple string compression: removing consecutive repeating substrings

I was asked this question recently in an Amazon interview .

Given a string, remove consecutive repeating substrings from it. If there are multiple consecutive intersecting substrings, remove the largest of those.

To make it clear, here are some examples:

  • input: aabcccddeaaa
    output: abcdea (Compressing the consecutive repeated characters)
  • input: abababcdeee
    output: abcde (Compressing consecutive repeated substrings)
  • input: ababcdabcd
    output: ababcd

(You can compress 'ab' or 'abcd', but as length of 'abcd' is larger you prefer compressing the larger one.)

I could not come up with an efficient implementation, anyone knows a good approach to this?

As this was an interview question, please refrain from using complicated library functions.

like image 479
Siddhanjay Godre Avatar asked Jul 13 '15 06:07

Siddhanjay Godre


1 Answers

For a string X, we can find the largest consecutive repeating substring in O(n^2) using Z-algorithm which Given a string S of length n, the Z Algorithm produces an array Z where Z[i] is the length of the longest substring starting from pat[i] which is also a prefix of pat (Source)

For each suffix of X start at i, applying Z-algorithm for this substring:

int result = 0;
for(int i = 0; i < X.length(); i++)
   int[]z = Z-algo(X.substring(i)); //this take O(n)
   for(int j = i + result + 1; j < X.length(); j++)
       if(z[j] >= j - i + 1)
          result = (j - i + 1);

Repeating the above procedure until we cannot find any repeating substring, we can obtain a O(n^3) algorithm.

Note: after reread the question, especially the last example, I figure out that valid repeating substrings are only limited from the original's substrings. So, the time complexity can be reduced to O(n^2 log n) by using a max heap.

like image 119
Pham Trung Avatar answered Sep 29 '22 04:09

Pham Trung