Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Relation between KMP algorithm and Z algorithm

KMP and Z algorithms are well known algorithms for string searching,

KMP algorithm deals with finding the patterns through a KMP failure function which is defined as (pat being the searching pattern)

lps[i] = the longest proper prefix of pat[0..i] which is also a suffix of pat[0..i].

e.g for string "abcab" it would be [0, 0, 0, 1, 2]

where as Z algorithm uses the z function which is defined as:

Given a string S of length n, the Z Algorithm produces an array Z where Z[i] is the length of the longest substring starting from pat[i] which is also a prefix of pat.

Now the question is can we achieve the Z function by the use of KMP algorithm? What I am searching for is some modifications in the lps array that leads to the same results as the Z[i] array.

like image 763
Ninja420 Avatar asked Aug 29 '13 21:08

Ninja420


2 Answers

NOTE: algorithm is wrong

for i in range(0, len(s)):
    if lps[i] != 0:
        Z[i - lps[i] + 1] = lps[i]

After that in Z[i] will be the maximum length of the suffix, that starts in position i and which is also a prefix of the string.

EDIT

As nikhil_vyas noted, the proposed algorithm does not solve your problem. What it actually does is partially filling Z array with the longest suffixes and some others. Such incomplete array can basically help you to solve several "find the longest something in the string" problems, but it does not answer your question.

The easiest way to rebuild Z array having lps array that comes to my mind is to build the string, corresponding to the lps array and then build Z array for that string. But I am not sure whether it suits your definition of "some modifications in the lps array".

like image 134
Mikhail Melnik Avatar answered Sep 25 '22 07:09

Mikhail Melnik


I think this will do it.

def Z(lps):
    # First assume that we always need to restart when we find a mismatch.
    Z = [0] * len(lps)

    # Step through adjacent pairs.
    itr = enumerate(zip(lps, lps[1:]), start=1)
    for i, (prev, cur) in itr:
        if cur <= prev: # suffix stopped growing
            Z[i - prev] = prev # Mark this suffix at its beginning.

    # Ending the string is also a way to stop growing the suffix.
    if cur > 0: # if we were still growing a suffix
        # At end of loop, cur is the new prev, and i+1 is the new i.
        # (i == len(lps) - 1, cur == lps[-1])
        Z[i+1 - cur] = cur

    return Z

Samples:

Z([0,0,0,1,2]) #=> [0,0,0,2,0]
Z([0,0,1,2,1,2]) #=> [0,0,2,0,2,0]
like image 27
leewz Avatar answered Sep 21 '22 07:09

leewz