Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String searching algorithms

For the two string searching algorithms: KMP and suffix tree, which is preferred in which cases? Give some practical examples.

like image 976
avd Avatar asked Apr 10 '10 11:04

avd


People also ask

What is the best string search algorithm?

Results: The Boyer-Moore-Horspool algorithm achieves the best overall results when used with medical texts. This algorithm usually performs at least twice as fast as the other algorithms tested. Conclusion: The time performance of exact string pattern matching can be greatly improved if an efficient algorithm is used.

What is the fastest string search algorithm?

The Aho-Corasick string searching algorithm simultaneously finds all occurrences of multiple patterns in one pass through the text. On the other hand, the Boyer-Moore algorithm is understood to be the fastest algorithm for a single pattern.

What is the algorithm used for pattern searching?

We use certain algorithms called pattern recognition to do the searching process. The complexity of pattern searching is O(m(n-m+1)). The algorithms are also known as String Searching Algorithms. These are very useful while we perform the search operation in the database.

What are the 2 types of searching algorithms?

In searching, there are two types: sequential search and interval search. Almost every search algorithm falls into one of these two categories. Linear and binary searches are two simple and easy-to-implement algorithms, with binary algorithms performing faster than linear algorithms.


1 Answers

A suffix tree is better if you will have to answer a lot of queries such as "is the needle present in the haystack?". KMP is better if you only have to search for one string in another single string, and not have to do it a lot of times.

A suffix tree is a much more general data structure, so you can do a lot more with it. See what you can do with it here. KMP is useful for finding if a string is a substring in another string.

You might also want to check out other algorithms, such as Boyer-Moore, Rabin-Karp and even the naive algorithm, as there are situations (inputs) in which one is better than the others.

Bottom line is:

  1. If you have a lot of queries like the one I mentioned above, it's worth to build a suffix tree and then answer each query faster.
  2. If you need to do more than those kinds of queries, a suffix tree is also worth building.
  3. If you only care about occasionally finding if a string is a substring of another string, then use KMP.
like image 167
IVlad Avatar answered Oct 04 '22 20:10

IVlad