Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

KMP prefix table

I am reading about KMP for string matching.
It needs a preprocessing of the pattern by building a prefix table.
For example for the string ababaca the prefix table is: P = [0, 0, 1, 2, 3, 0, 1]
But I am not clear on what does the numbers show. I read that it helps to find matches of the pattern when it shifts but I can not connect this info with the numbers in the table.

like image 687
Cratylus Avatar asked Dec 09 '12 21:12

Cratylus


People also ask

What is KMP algorithm with example?

Knuth Morris Pratt (KMP) is an algorithm, which checks the characters from left to right. When a pattern has a sub-pattern appears more than one in the sub-pattern, it uses that property to improve the time complexity, also for in the worst case. The time complexity of KMP is O(n).

Why is prefix needed in KMP?

The Prefix Function (Π): The Prefix Function, Π for a pattern encapsulates knowledge about how the pattern matches against the shift of itself. This information can be used to avoid a useless shift of the pattern 'p. ' In other words, this enables avoiding backtracking of the string 'S.

What is the definition of the prefix function that is used by the Knuth Morris Pratt algorithm show the Knuth Morris Pratt prefix function for pattern Aabaabcab?

What is the definition of the prefix function that is used by the Knuth/Morris/Pratt algorithm? For a given pattern string p, the value pi(k) is defined to be the length of the longest proper prefix of p that is also a suffix of p. Show the Knuth/Morris/Pratt prefix function for pattern aabaabcab. n. pi(n)

How do you calculate LPS in KMP?

We search for lps in sub-patterns. More clearly we focus on sub-strings of patterns that are either prefix and suffix. For each sub-pattern pat[0..i] where i = 0 to m-1, lps[i] stores length of the maximum matching proper prefix which is also a suffix of the sub-pattern pat[0..i].


2 Answers

Every number belongs to corresponding prefix ("a", "ab", "aba", ...) and for each prefix it represents length of longest suffix of this string that matches prefix. We do not count whole string as suffix or prefix here, it is called self-suffix and self-prefix (at least in Russian, not sure about English terms).

So we have string "ababaca". Let's look at it. KMP computes Prefix Function for every non-empty prefix. Let's define s[i] as the string, p[i] as the Prefix function. prefix and suffix may overlap.

+---+----------+-------+------------------------+ | i |  s[0:i]  | p[i]  | Matching Prefix/Suffix | +---+----------+-------+------------------------+ | 0 | a        |     0 |                        | | 1 | ab       |     0 |                        | | 2 | aba      |     1 | a                      | | 3 | abab     |     2 | ab                     | | 4 | ababa    |     3 | aba                    | | 5 | ababac   |     0 |                        | | 6 | ababaca  |     1 | a                      | |   |          |       |                        | +---+----------+-------+------------------------+ 

Simple C++ code that computes Prefix function of string S:

vector<int> prefixFunction(string s) {     vector<int> p(s.size());     int j = 0;     for (int i = 1; i < (int)s.size(); i++) {         while (j > 0 && s[j] != s[i])             j = p[j-1];          if (s[j] == s[i])             j++;         p[i] = j;     }        return p; } 
like image 179
imslavko Avatar answered Sep 25 '22 12:09

imslavko


This code may not be the shortest, but easy to understand flow of code. Simple Java Code for calculating prefix-Array-

    String pattern = "ababaca";     int i = 1, j = 0;     int[] prefixArray = new int[pattern.length];     while (i < pattern.length) {          while (pattern.charAt(i) != pattern.charAt(j) && j > 0) {             j = prefixArray[j - 1];          }         if (pattern.charAt(i) == pattern.charAt(j)) {             prefixArray[i] = j + 1;             i++;             j++;          } else {             prefixArray[i] = j;             i++;         }     }      for (int k = 0; k < prefixArray.length; ++k) {         System.out.println(prefixArray[k]);     } 

It produces the required output-

0 0 1 2 3 0 1

like image 27
Yogesh Sanchihar Avatar answered Sep 25 '22 12:09

Yogesh Sanchihar