I have a trouble with understanding this implementation of the Knuth-Morris-Pratt algorithm in Haskell.
http://twanvl.nl/blog/haskell/Knuth-Morris-Pratt-in-Haskell
In particular I don't understand the construction of the automaton. I know that it uses the "Tying the Knot" method to construct it, but it isn't clear to me and I also don't know why it should have the right complexity.
Another thing I would like to know is whether you think that this implementation could be easily generalized to implement the Aho-Corasick algorithm.
Thanks for your answers!
KMP algorithm is used to find a "Pattern" in a "Text". This algorithm campares character by character from left to right. But whenever a mismatch occurs, it uses a preprocessed table called "Prefix Table" to skip characters comparison while matching.
Knuth Morris Pratt (KMP) is an algorithm, which checks the characters from left to right. When a pattern has a sub-pattern appears more than one in the sub-pattern, it uses that property to improve the time complexity, also for in the worst case. The time complexity of KMP is O(n).
The KMP algorithm is a solution to the string search problem wherein we are required to find if a given pattern string occurs in another main string. It is one of the advanced string matching algorithm that was conceived by Donald Knuth, James H. Morris and Vaughan Pratt, hence the name "KMP algorithm".
The KMP algorithm is also a matter of dynamic programming. Our public account article directory has a series of articles that specialize in dynamic programming, and all are based on a set of frameworks.
So here's the algorithm:
makeTable :: Eq a => [a] -> KMP a
makeTable xs = table
where table = makeTable' xs (const table)
makeTable' [] failure = KMP True failure
makeTable' (x:xs) failure = KMP False test
where test c = if c == x then success else failure c
success = makeTable' xs (next (failure x))
Using that, let's look at the table constructed for "shoeshop":
makeTable "shoeshop" = table0
table0 = makeTable' "shoeshop" (const table0)
= KMP False test0
test0 c = if c == 's' then success1 else const table0 c
= if c == 's' then success1 else table0
success1 = makeTable' "hoeshop" (next (const table0 's'))
= makeTable' "hoeshop" (next table0)
= makeTable' "hoeshop" test0
= KMP False test1
test1 c = if c == 'h' then success2 else test0 c
success2 = makeTable' "oeshop" (next (test0 'h'))
= makeTable' "oeshop" (next table0)
= makeTable' "oeshop" test0
= makeTable' "oeshop" test0
= KMP False test2
test2 c = if c == 'o' then success3 else test0 c
success3 = makeTable' "eshop" (next (test0 'o'))
= makeTable' "eshop" (next table0)
= makeTable' "eshop" test0
= KMP False test3
test3 c = if c == 'e' then success4 else test0 c
success4 = makeTable' "shop" (next (test0 'e'))
= makeTable' "shop" (next table0)
= makeTable' "shop" test0
= KMP False test4
test4 c = if c == 's' then success5 else test0 c
success5 = makeTable' "hop" (next (test0 's'))
= makeTable' "hop" (next success1)
= makeTable' "hop" test1
= KMP False test5
test5 c = if c == 'h' then success6 else test1 c
success6 = makeTable' "op" (next (test1 'h'))
= makeTable' "op" (next success2)
= makeTable' "op" test2
= KMP False test6
test6 c = if c == 'o' then success7 else test2 c
success7 = makeTable' "p" (next (test2 'o'))
= makeTable' "p" (next success3)
= makeTable' "p" test3
= KMP False test7
test7 c = if c == 'p' then success8 else test3 c
success8 = makeTable' "" (next (test3 'p'))
= makeTable' "" (next (test0 'p'))
= makeTable' "" (next table0)
= makeTable' "" test0
= KMP True test0
Note how success5
uses the consumed 's' to retrace the initial 's' of the pattern.
Now walk through what happens when you do isSubstringOf2 "shoeshop" $ cycle "shoe"
.
See that when test7
fails to match 'p', it falls back to test3
to try to match 'e', so that we cycle through success4
, success5
, success6
and success7
ad infinitum.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With