I was asked this question recently in an Amazon interview .
Given a string, remove consecutive repeating substrings from it. If there are multiple consecutive intersecting substrings, remove the largest of those.
To make it clear, here are some examples:
aabcccddeaaa
abcdea
(Compressing the consecutive repeated characters)abababcdeee
abcde
(Compressing consecutive repeated substrings)ababcdabcd
ababcd
(You can compress 'ab
' or 'abcd
', but as length of 'abcd
' is larger you prefer compressing the larger one.)
I could not come up with an efficient implementation, anyone knows a good approach to this?
As this was an interview question, please refrain from using complicated library functions.
For a string X
, we can find the largest consecutive repeating substring in O(n^2) using Z-algorithm which Given a string S of length n, the Z Algorithm produces an array Z where Z[i] is the length of the longest substring starting from pat[i] which is also a prefix of pat
(Source)
For each suffix of X
start at i
, applying Z-algorithm for this substring:
int result = 0;
for(int i = 0; i < X.length(); i++)
int[]z = Z-algo(X.substring(i)); //this take O(n)
for(int j = i + result + 1; j < X.length(); j++)
if(z[j] >= j - i + 1)
result = (j - i + 1);
Repeating the above procedure until we cannot find any repeating substring, we can obtain a O(n^3) algorithm.
Note: after reread the question, especially the last example, I figure out that valid repeating substrings are only limited from the original's substrings. So, the time complexity can be reduced to O(n^2 log n) by using a max heap.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With