KMP algorithm for string matching.
Following is the code I found online for computing the longest prefix-suffix array:
Defination:
lps[i] = the longest proper prefix of pat[0..i]
which is also a suffix of pat[0..i].
Code:
void computeLPSArray(char *pat, int M, int *lps)
{
int len = 0; // length of the previous longest prefix suffix
int i;
lps[0] = 0; // lps[0] is always 0
i = 1;
// the loop calculates lps[i] for i = 1 to M-1
while(i < M)
{
if(pat[i] == pat[len])
{
len++;
lps[i] = len;
i++;
}
else // (pat[i] != pat[len])
{
if( len != 0 )
{
// This is tricky. Consider the example AAACAAAA and i = 7.
len = lps[len-1]; //*****************
// Also, note that we do not increment i here
}
else // if (len == 0)
{
lps[i] = 0;
i++;
}
}
}
}
Can I use len = len-1
instead of len = lps[len-1]
?
because len always counts the prefix length like from [0 .. someIndex]. Then why use lps for assignment here? Following are the cases for which I tested which work fine(first line is the pattern and subsequent two lines are the result for original and modified assignment to len
) :
a a a b a b c
0 1 2 0 1 0 0
0 1 2 0 1 0 0
a b c b a b c
0 0 0 0 1 2 3
0 0 0 0 1 2 3
a a b c b a b
0 1 0 0 0 1 0
0 1 0 0 0 1 0
Code here with both variations written : http://ideone.com/qiSrUo
The main component of the KMP algorithm is the prefix function. It is defined as for a string s of length n, π[i] is the length of the longest proper prefix of the substring s[0…i], which is also a suffix of this substring where the proper prefix of a string is a prefix that is not equal to the string itself.
Knuth Morris Pratt (KMP) is an algorithm, which checks the characters from left to right. When a pattern has a sub-pattern appears more than one in the sub-pattern, it uses that property to improve the time complexity, also for in the worst case. The time complexity of KMP is O(n).
The time complexity of KMP algorithm is O(n) in the worst case. KMP (Knuth Morris Pratt) Pattern Searching The Naive pattern searching algorithm doesn't work well in cases where we see many matching characters followed by a mismatching character. Following are some examples.
In the KMP string matching algorithm, we calculate the LPS array such that LPS[i] denotes the longest proper prefix that is also a suffix of the substring ending at index 'i'. We can use the same approach to calculate the LPS array. Our final answer will be LPS[n-1].
Following a case for which it does not work:
i 0 1 2 3 4 5
p A B A B B A
c1 0 0 1 2 0 1
c2 0 0 1 2 2 3
The reason being:
At i=4, len=2
p[i]='B' and p[len]='A' //Mismatch!
lps string upto i=3: AB(0-1 prefix), (2-3 suffix)
-------------------------------
i=4
Next charecter: B
len=2 // longest prefix suffix length
Charecter looking for : A (=p[len])
So upto i=3 we had AB(0-1) as the prefix that matched with suffix AB(2-3), but now at i=4 there is a mismatch so we see that we can't extend the original prefix(0-1) so the position to be checked is the prefix found prior to "AB" which is done by lps[len-1] < -1 as the array starts from 0 > and this is not necessarily len-1 as we may need to step back further than that to get the new longest prefix suffix.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With