Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Knuth-Morris-Pratt algorithm in Haskell

I have a trouble with understanding this implementation of the Knuth-Morris-Pratt algorithm in Haskell.

http://twanvl.nl/blog/haskell/Knuth-Morris-Pratt-in-Haskell

In particular I don't understand the construction of the automaton. I know that it uses the "Tying the Knot" method to construct it, but it isn't clear to me and I also don't know why it should have the right complexity.

Another thing I would like to know is whether you think that this implementation could be easily generalized to implement the Aho-Corasick algorithm.

Thanks for your answers!

like image 312
pizet Avatar asked May 22 '13 14:05

pizet


People also ask

How does Knuth Morris Pratt algorithm work?

KMP algorithm is used to find a "Pattern" in a "Text". This algorithm campares character by character from left to right. But whenever a mismatch occurs, it uses a preprocessed table called "Prefix Table" to skip characters comparison while matching.

What is Knuth Morris Pratt algorithm in data structure?

Knuth Morris Pratt (KMP) is an algorithm, which checks the characters from left to right. When a pattern has a sub-pattern appears more than one in the sub-pattern, it uses that property to improve the time complexity, also for in the worst case. The time complexity of KMP is O(n).

What problem does the Knuth Morris Pratt algorithm solve?

The KMP algorithm is a solution to the string search problem wherein we are required to find if a given pattern string occurs in another main string. It is one of the advanced string matching algorithm that was conceived by Donald Knuth, James H. Morris and Vaughan Pratt, hence the name "KMP algorithm".

Is Knuth Morris Pratt dynamic programming?

The KMP algorithm is also a matter of dynamic programming. Our public account article directory has a series of articles that specialize in dynamic programming, and all are based on a set of frameworks.


1 Answers

So here's the algorithm:

makeTable :: Eq a => [a] -> KMP a
makeTable xs = table
   where table = makeTable' xs (const table)

makeTable' []     failure = KMP True failure
makeTable' (x:xs) failure = KMP False test
   where  test  c = if c == x then success else failure c
          success = makeTable' xs (next (failure x))

Using that, let's look at the table constructed for "shoeshop":

makeTable "shoeshop" = table0

table0 = makeTable' "shoeshop" (const table0)
       = KMP False test0

test0 c = if c == 's' then success1 else const table0 c
        = if c == 's' then success1 else table0

success1 = makeTable' "hoeshop" (next (const table0 's'))
         = makeTable' "hoeshop" (next table0)
         = makeTable' "hoeshop" test0
         = KMP False test1

test1 c = if c == 'h' then success2 else test0 c

success2 = makeTable' "oeshop" (next (test0 'h'))
         = makeTable' "oeshop" (next table0)
         = makeTable' "oeshop" test0
         = makeTable' "oeshop" test0
         = KMP False test2

test2 c = if c == 'o' then success3 else test0 c

success3 = makeTable' "eshop" (next (test0 'o'))
         = makeTable' "eshop" (next table0)
         = makeTable' "eshop" test0
         = KMP False test3

test3 c = if c == 'e' then success4 else test0 c

success4 = makeTable' "shop" (next (test0 'e'))
         = makeTable' "shop" (next table0)
         = makeTable' "shop" test0
         = KMP False test4

test4 c = if c == 's' then success5 else test0 c

success5 = makeTable' "hop" (next (test0 's'))
         = makeTable' "hop" (next success1)
         = makeTable' "hop" test1
         = KMP False test5

test5 c = if c == 'h' then success6 else test1 c

success6 = makeTable' "op" (next (test1 'h'))
         = makeTable' "op" (next success2)
         = makeTable' "op" test2
         = KMP False test6

test6 c = if c == 'o' then success7 else test2 c

success7 = makeTable' "p" (next (test2 'o'))
         = makeTable' "p" (next success3)
         = makeTable' "p" test3
         = KMP False test7

test7 c = if c == 'p' then success8 else test3 c

success8 = makeTable' "" (next (test3 'p'))
         = makeTable' "" (next (test0 'p'))
         = makeTable' "" (next table0)
         = makeTable' "" test0
         = KMP True test0

Note how success5 uses the consumed 's' to retrace the initial 's' of the pattern.

Now walk through what happens when you do isSubstringOf2 "shoeshop" $ cycle "shoe".

See that when test7 fails to match 'p', it falls back to test3 to try to match 'e', so that we cycle through success4, success5, success6 and success7 ad infinitum.

like image 55
rampion Avatar answered Oct 16 '22 02:10

rampion