I have generated an string using the following alphabet. {A,C,G,T}
. And my string contains more than 10000 characters. I'm searching the following patterns in it.
I have asked to use a string matching algorithm which has O(m+n)
running time.
m = pattern length n = text length
Both KMP and Rabin-Karp algorithms
have this running time. What is the most suitable algorithm (between Rabin-Carp and KMP) in this situation?
The most important difference between them is how reliable they are in finding a match. KMP guarantees 100% reliability. You cannot guarantee 100% with Rabin Karp because of a chance of collision during hash table lookup.
Rabin-Karp algorithm is an algorithm used for searching/matching patterns in the text using a hash function. Unlike Naive string matching algorithm, it does not travel through every character in the initial phase rather it filters the characters that do not match and then performs the comparison.
The Karp-Rabin Algorithm.
KMP Algorithm has a guaranteed worst-case linear-time performance, i.e. both time and space complexity: O(m + n) in worst case. But, Boyer Moore can be much faster than KMP, but only on certain inputs that allow many characters to be skipped (similar to the example in the 2nd point).
When you want to search for multiple patterns, typically the correct choice is to use Aho-Corasick, which is somewhat a generalization of KMP. Now in your case you are only searching for 3 patterns so it may be the case that KMP is not that much slower(at most three times), but this is the general approach.
Rabin-Karp is easier to implement if we assume that a collision will never happen, but if the problem you have is a typical string searching KMP will be more stable no matter what input you have. However, Rabin-Karp has many other applications, where KMP is not an option.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With