Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest algorithm to find a string in an array of strings?

Tags:

algorithm

This question is merely about algorithm. In pseudo code is like this:

A = Array of strings; //let's say count(A)  = N
S = String to find;   //let's say length(S) = M

for (Index=0; Index<count(A); Index++)
  if (A[Index]==S) {
    print "First occurrence at index\x20"+Index;
    break;
  }

This for loop requires string comparison N times (or byte comparison N*M times, O(N*M)). This is bad when array A has lots of items, or when string S is too long.

Any better method to find out the first occurrence? Some algorithm at O(K*logK) is OK, but preferable at O(K) or best at O(logK), where K is either N or M.

I don't mind adding in some other structures or doing some data processing before the comparison loop.

like image 415
jondinham Avatar asked Apr 28 '12 18:04

jondinham


2 Answers

You could convert the whole array of strings to a finite state machine, where the transitions are the characters of the strings and put the smallest index of the strings that produced a state into the state. This takes a lot of time, and may be considered indexing.

like image 65
Reactormonk Avatar answered Oct 14 '22 19:10

Reactormonk


Put the strings into a hash based set, and test to see if a given string is contained in the set should give you more or less constant performance once the set is built.

like image 31
Bill Avatar answered Oct 14 '22 19:10

Bill