Given the following problem :
Definition :
Let S be a string over alphabet Σ .
S'
is the smallest period ofS
ifS'
is the smallest string such that :
S = (S')^k (S'')
,where
S''
is a prefix ofS
. If no suchS'
exists , thenS
is not periodic .Example :
S = abcabcabcabca
. Thenabcabc
is a period sinceS = abcabc abcabc a
, but the smallest period isabc
sinceS = abc abc abc abc a
.Give an algorithm to find the smallest period of input string
S
or declare thatS
is not periodic.Hint : You can do that in
O(n)
...
My solution : We use KMP , which runs in O(n) .
By the definition of the problem , S = (S')^k (S'') , then I think that if we create an automata for the shortest period , and find a way to find that shortest period , then I'm done.
The problem is where to put the FAIL arrow of the automata ...
Any ideas would be greatly appreciated ,
Regards
The main component of the KMP algorithm is the prefix function. It is defined as for a string s of length n, π[i] is the length of the longest proper prefix of the substring s[0…i], which is also a suffix of this substring where the proper prefix of a string is a prefix that is not equal to the string itself.
Alright so this problem can definitely be solved in O(n), we just have to cleverly use KMP as you suggested.
Solving the longest proper prefix which is also a suffix problem is a vital part of KMP that we will make use of.
The longest proper prefix which is also a suffix problem is a mouthful so let's just call it the prefix suffix problem for now.
The prefix suffix problem can be pretty hard to understand so I'll include some examples.
The prefix suffix solution for "abcabc" is "abc" since that is the longest string which is both a proper prefix and a proper suffix (proper prefixes and suffixes cannot be the entire string).
The prefix suffix solution for "abcabca" is "a"
Hmmmmmmmmm wait a minute if we just chop off "a" from the end of "abcabca" we are left with "abcabc" and if we get the solution("abc") for this new string and chop it off again we are left with "abc" Hmmmmmmmmm. Very interesting.(This is pretty much the solution but I will talk about why this works)
Alright let's try to formalize this intuition a bit more and see if we can arrive at a solution.
I will use one key assumption in my argument:
The smallest period of our pattern is a valid period of every larger period in our pattern
Let us store the prefix suffix solution for the first i
characters of our pattern in lps[i]
. This lps
array can be calculated in O(n)
and it is used in the KMP algorithm, you can read more about how to calculate it in O(n)
here: https://www.geeksforgeeks.org/kmp-algorithm-for-pattern-searching/
Just so we are clear I will list some examples of some lps
arrays
Pattern:"aaaaa"
lps: [0, 1, 2, 3, 4]
Pattern:"aabbcc"
lps: [0, 1, 0, 0, 0, 0]
Pattern:"abcabcabc"
lps: [0, 0, 0, 1, 2, 3, 4, 5, 6]
Alright now lets define some variables, to help us find out why this lps
array is useful.
Let
l
be the length of our pattern, and letk
be the last value in our lps array(k=lps[l-1]
)
The value k
tells us that the first k
characters of our string are the same as the last k
characters of our string. And we can use this fact to find a period!
Using this information we can now show that the prefix consisting of the first l-k
characters of our string form a valid period. This is clear because the next k
characters which are not in our prefix must match the first k
characters of our prefix, because of how we defined our lps
array. The first k
characters that from our prefix must be the same as the last k
characters which form our suffix.
In practice you can implement this with a simple while loop as shown below where index
marks the end of the suffix you are currently considering to be the smallest period.
public static void main(String[] args){
String pattern="abcabcabcabca";
int[] lps= calculateLPS(pattern);
//start at the end of the string
int index=lps.length-1;
while(lps[index]!=0){
//shift back
index-=lps[index];
}
System.out.println(pattern.substring(0,index+1));
}
And since calculating lps
happens in O(n)
, and you are always moving at least 1 step back in the while loop the time complexity for the whole procedure is simply O(n)
I borrowed heavily from the geeksForGeeks implementation of KMP in my calculateLPS() method if you would like to see my exact code it is below, but I reccomend that you also look at their explanation: https://www.geeksforgeeks.org/kmp-algorithm-for-pattern-searching/
static int[] calculateLPS(String pat) {
int[] lps = new int[pat.length()];
int len = 0;
int i = 1;
lps[0] = 0;
while (i < pat.length()) {
if (pat.charAt(i) == pat.charAt(len)) {
len++;
lps[i] = len;
i++;
}
else {
if (len != 0) {
len = lps[len - 1];
}
else {
lps[i] = len;
i++;
}
}
}
System.out.println(Arrays.toString(lps));
return lps;
}
Last but not least, thanks for posting such an interesting problem it was pretty fun to figure out! Also I am new to this so please let me know if any part of my explanation doesn't make sense.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With