I am reading about KMP
for string matching.
It needs a preprocessing of the pattern by building a prefix table.
For example for the string ababaca
the prefix table is: P = [0, 0, 1, 2, 3, 0, 1]
But I am not clear on what does the numbers show. I read that it helps to find matches of the pattern when it shifts but I can not connect this info with the numbers in the table.
Knuth Morris Pratt (KMP) is an algorithm, which checks the characters from left to right. When a pattern has a sub-pattern appears more than one in the sub-pattern, it uses that property to improve the time complexity, also for in the worst case. The time complexity of KMP is O(n).
The Prefix Function (Π): The Prefix Function, Π for a pattern encapsulates knowledge about how the pattern matches against the shift of itself. This information can be used to avoid a useless shift of the pattern 'p. ' In other words, this enables avoiding backtracking of the string 'S.
What is the definition of the prefix function that is used by the Knuth/Morris/Pratt algorithm? For a given pattern string p, the value pi(k) is defined to be the length of the longest proper prefix of p that is also a suffix of p. Show the Knuth/Morris/Pratt prefix function for pattern aabaabcab. n. pi(n)
We search for lps in sub-patterns. More clearly we focus on sub-strings of patterns that are either prefix and suffix. For each sub-pattern pat[0..i] where i = 0 to m-1, lps[i] stores length of the maximum matching proper prefix which is also a suffix of the sub-pattern pat[0..i].
Every number belongs to corresponding prefix ("a", "ab", "aba", ...) and for each prefix it represents length of longest suffix of this string that matches prefix. We do not count whole string as suffix or prefix here, it is called self-suffix and self-prefix (at least in Russian, not sure about English terms).
So we have string "ababaca". Let's look at it. KMP computes Prefix Function for every non-empty prefix. Let's define s[i]
as the string, p[i]
as the Prefix function. prefix and suffix may overlap.
+---+----------+-------+------------------------+ | i | s[0:i] | p[i] | Matching Prefix/Suffix | +---+----------+-------+------------------------+ | 0 | a | 0 | | | 1 | ab | 0 | | | 2 | aba | 1 | a | | 3 | abab | 2 | ab | | 4 | ababa | 3 | aba | | 5 | ababac | 0 | | | 6 | ababaca | 1 | a | | | | | | +---+----------+-------+------------------------+
Simple C++ code that computes Prefix function of string S:
vector<int> prefixFunction(string s) { vector<int> p(s.size()); int j = 0; for (int i = 1; i < (int)s.size(); i++) { while (j > 0 && s[j] != s[i]) j = p[j-1]; if (s[j] == s[i]) j++; p[i] = j; } return p; }
This code may not be the shortest, but easy to understand flow of code. Simple Java Code for calculating prefix-Array-
String pattern = "ababaca"; int i = 1, j = 0; int[] prefixArray = new int[pattern.length]; while (i < pattern.length) { while (pattern.charAt(i) != pattern.charAt(j) && j > 0) { j = prefixArray[j - 1]; } if (pattern.charAt(i) == pattern.charAt(j)) { prefixArray[i] = j + 1; i++; j++; } else { prefixArray[i] = j; i++; } } for (int k = 0; k < prefixArray.length; ++k) { System.out.println(prefixArray[k]); }
It produces the required output-
0 0 1 2 3 0 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With