I'm basically benchmarking some high speed string matching algorithms, I came across a few.
Backwards Non-deterministic DAWG (Directed acyclic word graph) Matching algorithm by Gonzalo Navarro and Mathieu Raffinot. See "A Bit-Parallel Approach to Suffix Automata: Fast Extended String Matching"
Horspool's improved version of the Boyer-Moore String searching algorithm. See "Practical fast searching in strings"
Shift-Or algorithm with mismatches
KMP
Are there any other better high speed string matching algorithms i can try ?
Edit : There is another thread in similar lines , which has good references too
The Aho-Corasick string searching algorithm simultaneously finds all occurrences of multiple patterns in one pass through the text. On the other hand, the Boyer-Moore algorithm is understood to be the fastest algorithm for a single pattern.
Hashing-string matching algorithms: Rabin Karp Algorithm: It matches the hash value of the pattern with the hash value of current substring of text, and if the hash values match then only it starts matching individual characters.
The most important difference between them is how reliable they are in finding a match. KMP guarantees 100% reliability. You cannot guarantee 100% with Rabin Karp because of a chance of collision during hash table lookup.
BM25 [5] is the Best Matching ranking algorithm that ranks the documents according to its relevance to the given search query.
Two-way string matching is, to my knowledge, the best general-purpose algorithm for string matching. It has linear worst-case complexity, uses constant space, and doesn't backtrack too much more than necessary. And the theory behind it is very nice.
If you know that your users aren't jerks, naive string matching optimised for your architecture will win for short "needles" while a Boyer-Moore variant will start really doing the sublinear thing for long "needles." However, naive string matching has a quadratic worst case and Boyer-Moore can be made to examine all characters in the input. The extra tables needed to handle mismatches actually carry a surprisingly severe penalty over two-way string matching.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With