I am not able to understand the below algorithm which is used for string pattern matching using Aho-Corasick alg.
Procedure AC(y,n,q0)
INPUT: y<-array of m bytes representing the text input
(SQL Query Statement)
n<-integer representing the text length
(SQL Query Length)
q0<-initial state (first character in pattern)
2: State <-q0
3: For i = 1 to n do
4: While g ( State, y[i] = = fail) do
5: State ← f (State)
6: End While
7: State ← g(State,.y[i])
8: If o(State) then
9: Output i
10: Else
11: Output
12: End If
13: End for
14: End Procedure
You probably won't gain a good understanding of the Aho-Corasick algorithm from reading a little bit of pseudocode. Unless you understand the state transition table, the algorithm will make no sense at all.
There's a decent explanation along with an animation at Aho-Corasick implementation and animation.
The original paper, Efficient String Matching: An Aid to Bibliographic Search(PDF), is well written and understandable, and the pseudocode examples are pretty easy to convert to working code. It'll take a little study, but you should have a good understanding after you read the paper, think about it a bit, and then read it again.
Aho Corasick algorithm is used to solve set matching problem. It means we have a set of strings S, and here comes a long string L to check whether L contains any one in the previous set S.
An basic solution is using a trie tree, i.e. a prefix tree, please see Wikipedia. There are typically two steps to deal with the problem.
The trie tree is easy to understand. It store prefix substrings with nodes starts from the root.
The Aho-Corasick algorithm is an extension of a trie tree, not far from the basic idea. Aho-Corasick algorithm adds a failed pointer to each node on a trie tree.
When fails, a trie tree will restart from the root (add the start index on L by 1), but Aho-Corasick algorithm will restart from the node D pointed by the failed pointer (add the start index on L by the depth of the node D).
Below is a C++ implementation of the Aho-Corasick algorithm. It contains some bugs. http://www.komodia.com/aho-corasick
I fixed the bugs I found. And you can access my version here: https://github.com/elfinhe/KomodiaAhoCorasick
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With