Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When would you use KMP over BOYER-MOORE

I am currently learning about pattern matching algorithms and have come across these two algorithms. I have the following general ideas:

KMP

  • Compares text left-to-right
  • Uses a failure array to shift intelligently
  • takes O(m), where m is the length of the pattern, to compute failure array
  • takes O(m), space
  • takes O(n), time to search a string

BM

  • Compares pattern from last character
  • Uses bad character jumps and good suffix jumps
  • takes O(m + size of alphabet) to compute tables
  • takes O(m + size of alphabet), space
  • takes O(n), but usually better to search

I came across the following question which triggered this question(True or False):

The Knuth-Morris-Pratt (KMP) algorithm is a good choice if we want to search for the same pattern repeatedly in many different texts.

So I believe the answer is true just because the assumption is that every time you run the algorithm on different text the preprocessing is only O(n) where for BM it is O(n + size of alphabet). However, I am not sure if I am making the correct assumption that every time the algorithm is rerun a new table is recomputed. Because say the text always falls in the alphabet of english. I would only need to compute the table once and just reuse the table. So at the end of the day, would the answer to this question be dependent on the fact that the algorithms are all being run on text which is contained in the same alphabet or is there some other factor which may affect it?

like image 436
Eric Avatar asked Apr 18 '13 14:04

Eric


People also ask

Is KMP better than Boyer Moore?

KMP Algorithm has a guaranteed worst-case linear-time performance, i.e. both time and space complexity: O(m + n) in worst case. But, Boyer Moore can be much faster than KMP, but only on certain inputs that allow many characters to be skipped (similar to the example in the 2nd point).

Why do we use KMP algorithm?

Overview. KMP algorithm is used to find a "Pattern" in a "Text". This algorithm campares character by character from left to right. But whenever a mismatch occurs, it uses a preprocessed table called "Prefix Table" to skip characters comparison while matching.

Where is KMP used?

In real world KMP algorithm is used in those applications where pattern matching is done in long strings, whose symbols are taken from an alphabet with little cardinality. A relevant example is the DNA alphabet, which consists on only 4 symbols (A,C,G,T).

What is the difference between horspool and Boyer Moore?

Horspool's algorithm only used the character value of the text aligned with last character of the pattern to determine the shift. Boyer-Moore algorithm also uses the location and the character mismatch to calculate the shift. In addition it uses the occurrences of the suffixes in the pattern to determine the shift.


1 Answers

In theory, both algorithms will have "similar" performance; KMP will do about 2n comparisons in the searching phase and Boyer-Moore will do about 3n comparisons in the searching phase in the worst case. In neither case do you need to repeat the preprocessing when you get a new text.

But the real answer is that you shouldn't use either one in practice.

The linear auxiliary storage needed by both algorithms leads to considerably...rougher performance on modern architectures because of all of the extra memory accesses.

However, the ideas behind Boyer-Moore and KMP underpin most fast string matching algorithms. Something like KMP's "failure function" idea is used by every practically effective string matching algorithm I know of; it turns out that you can compute a suboptimal "failure function" for a pattern on-the-fly that still gives you linear time matching while only needing constant additional space. Boyer-Moore is faster than linear in the "average case" of matching a fixed pattern against random noise, and this bears itself out in many practical situations.

like image 75
tmyklebu Avatar answered Oct 05 '22 21:10

tmyklebu