I am currently learning about pattern matching algorithms and have come across these two algorithms. I have the following general ideas:
KMP
BM
I came across the following question which triggered this question(True or False):
The Knuth-Morris-Pratt (KMP) algorithm is a good choice if we want to search for the same pattern repeatedly in many different texts.
So I believe the answer is true just because the assumption is that every time you run the algorithm on different text the preprocessing is only O(n) where for BM it is O(n + size of alphabet). However, I am not sure if I am making the correct assumption that every time the algorithm is rerun a new table is recomputed. Because say the text always falls in the alphabet of english. I would only need to compute the table once and just reuse the table. So at the end of the day, would the answer to this question be dependent on the fact that the algorithms are all being run on text which is contained in the same alphabet or is there some other factor which may affect it?
KMP Algorithm has a guaranteed worst-case linear-time performance, i.e. both time and space complexity: O(m + n) in worst case. But, Boyer Moore can be much faster than KMP, but only on certain inputs that allow many characters to be skipped (similar to the example in the 2nd point).
Overview. KMP algorithm is used to find a "Pattern" in a "Text". This algorithm campares character by character from left to right. But whenever a mismatch occurs, it uses a preprocessed table called "Prefix Table" to skip characters comparison while matching.
In real world KMP algorithm is used in those applications where pattern matching is done in long strings, whose symbols are taken from an alphabet with little cardinality. A relevant example is the DNA alphabet, which consists on only 4 symbols (A,C,G,T).
Horspool's algorithm only used the character value of the text aligned with last character of the pattern to determine the shift. Boyer-Moore algorithm also uses the location and the character mismatch to calculate the shift. In addition it uses the occurrences of the suffixes in the pattern to determine the shift.
In theory, both algorithms will have "similar" performance; KMP will do about 2n comparisons in the searching phase and Boyer-Moore will do about 3n comparisons in the searching phase in the worst case. In neither case do you need to repeat the preprocessing when you get a new text.
But the real answer is that you shouldn't use either one in practice.
The linear auxiliary storage needed by both algorithms leads to considerably...rougher performance on modern architectures because of all of the extra memory accesses.
However, the ideas behind Boyer-Moore and KMP underpin most fast string matching algorithms. Something like KMP's "failure function" idea is used by every practically effective string matching algorithm I know of; it turns out that you can compute a suboptimal "failure function" for a pattern on-the-fly that still gives you linear time matching while only needing constant additional space. Boyer-Moore is faster than linear in the "average case" of matching a fixed pattern against random noise, and this bears itself out in many practical situations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With