I was asked this question recently in an Amazon interview .
Given a string, remove consecutive repeating substrings from it. If there are multiple consecutive intersecting substrings, remove the largest of those.
To make it clear, here are some examples:
aabcccddeaaaabcdea   (Compressing the consecutive repeated characters)abababcdeeeabcde (Compressing consecutive repeated substrings)ababcdabcdababcd
(You can compress 'ab' or 'abcd', but as length of 'abcd' is larger you prefer compressing the larger one.)
I could not come up with an efficient implementation, anyone knows a good approach to this?
As this was an interview question, please refrain from using complicated library functions.
For a string X, we can find the largest consecutive repeating substring in O(n^2) using Z-algorithm which Given a string S of length n, the Z Algorithm produces an array Z where Z[i] is the length of the longest substring starting from pat[i] which is also a prefix of pat (Source)
For each suffix of X start at i, applying Z-algorithm for this substring:
int result = 0;
for(int i = 0; i < X.length(); i++)
   int[]z = Z-algo(X.substring(i)); //this take O(n)
   for(int j = i + result + 1; j < X.length(); j++)
       if(z[j] >= j - i + 1)
          result = (j - i + 1);
Repeating the above procedure until we cannot find any repeating substring, we can obtain a O(n^3) algorithm.
Note: after reread the question, especially the last example, I figure out that valid repeating substrings are only limited from the original's substrings. So, the time complexity can be reduced to O(n^2 log n) by using a max heap.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With