Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why can the KMP failure function be computed in O(n) time?

Wikipedia claims that the failure function table can be computed in O(n) time.

Let's look at its `canonical' implementation (in C++):

vector<int> prefix_function (string s) {
    int n = (int) s.length();
    vector<int> pi (n);
    for (int i=1; i<n; ++i) {
        int j = pi[i-1];
        while (j > 0 && s[i] != s[j])
            j = pi[j-1];
        if (s[i] == s[j])  ++j;
        pi[i] = j;
    }
    return pi;
}

Why does it work in O(n) time, even if there is an inner while-loop? I'm not really strong at the analysis of algorithms, so may somebody explain it?

like image 611
vortexxx192 Avatar asked Sep 07 '13 07:09

vortexxx192


People also ask

What is KMP failure function?

f - failure function (by definition, this is the length of the longest prefix of the string which is a suffix also) Here how I built it step by step: f(a) = 0 (always = 0 for one letter) f(aa) = 1 (one letter 'a' is both a prefix and suffix) f(aab) = 0 (there is no the same suffixes and prefixes: a != b, aa != ab)

What is the best case complexity of KMP algorithm?

The best case for KMP is O(k) where k is the length of the search term, and happens when the string to be searched inside is of length 0.


1 Answers

There's already two answers here that are correct, but I often think a fully laid out proof can make things clearer. You said you wanted an answer for a 9-year-old, but I don't think it's feasible (I think it's easy to be fooled into thinking it's true without actually having any intuition for why it's true). Maybe working through this answer will help.

First off, the outer loop runs n times clearly because i is not modified within the loop. The only code within the loop that could run more than once is the block

while (j > 0 && s[i] != s[j])
{   
    j = pi[j-1]
}   

So how many times can that run? Well notice that every time that condition is satisfied we decrease the value of j which, at this point, is at most pi[i-1]. If it hits 0 then the while loop is done. To see why this is important, we first prove a lemma (you're a very smart 9-year-old):

pi[i] <= i

This is done by induction. pi[0] <= 0 since it's set once in the initialization of pi and never touched again. Then inductively we let 0 < k < n and assume the claim holds for 0 <= a < k. Consider the value of p[k]. It's set precisely once in the line pi[i] = j. Well how big can j be? It's initialized to pi[k-1] <= k-1 by induction. In the while block it then may be updated to pi[j-1] <= j-1 < pi[k-1]. By another mini-induction you can see that j will never increase past pi[k-1]. Hence after the while loop we still have j <= k-1. Finally it might be incremented once so we have j <= k and so pi[k] = j <= k (which is what we needed to prove to finish our induction).

Now returning back to the original point, we ask "how many times can we decrease the value of j"? Well with our lemma we can now see that every iteration of the while loop will monotonically decrease the value of j. In particular we have:

pi[j-1] <= j-1 < j 

So how many times can this run? At most pi[i-1] times. The astute reader might think "you've proven nothing! We have pi[i-1] <= i-1 but it's inside the while loop so it's still O(n^2)!". The slightly more astute reader notices this extra fact:

However many times we run j = pi[j-1] we then decrease the value of pi[i] which shortens the next iteration of the loop!

For example, let's say j = pi[i-1] = 10. But after ~6 iterations of the while loop we have j = 3 and let's say it gets incremented by 1 in the s[i] == s[j] line so j = 4 = pi[i]. Well then at the next iteration of the outer loop we start with j = 4... so we can only execute the while at most 4 times.

The final piece of the puzzle is that ++j runs at most once per loop. So it's not like we can have something like this in our pi vector:

0 1 2 3 4 5 1 6 1 7 1 8 1 9 1
           ^   ^   ^   ^   ^
Those spots might mean multiple iterations of the while loop if this 
could happen

To make this actually formal you might establish the invariants described above and then use induction to show that the total number of times that while loop is run, summed with pi[i] is at most i. From that, it follows that the total number of times the while loop is run is O(n) which means that the entire outer loop has complexity:

O(n)     // from the rest of the outer loop excluding the while loop
+ O(n)   // from the while loop
=> O(n) 
like image 76
rliu Avatar answered Sep 23 '22 13:09

rliu