Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient Data Structure For Substring Search?

Assume I have a set of strings S and a query string q. I want to know if any member of S is a substring of q. (For the purpose of this question substring includes equality, e.g. "foo" is a substring of "foo".) For example assume the function that does what I want is called anySubstring:

S = ["foo", "baz"]
q = "foobar"
assert anySubstring(S, q)  # "foo" is a substring of "foobar"

S = ["waldo", "baz"]
assert not anySubstring(S, q)

Is there any easy-to-implement algorithm to do this with time complexity sublinear in len(S)? It's ok if S has to be processed into some clever data structure first because I will be querying each S with a lot of q strings, so the amortized cost of this preprocessing might be reasonable.

EDIT: To clarify, I don't care which member of S is a substring of q, only whether at least one is. In other words, I only care about a boolean answer.

like image 314
dsimcha Avatar asked Mar 09 '12 15:03

dsimcha


3 Answers

I think Aho-Corasick algorithm does what you want. I think there is another solution which is very simple to implement, it's Karp-Rabin algorithm.

like image 148
aelguindy Avatar answered Oct 15 '22 00:10

aelguindy


So if the length of S is way less then the sum of the lengths of the potential substrings your best option would be to build a suffix tree from S and then do a search in it. This is linear with respect to the length of S plus the summar length of the candidate substrings. Of course there can not be an algorithm with better complexity as you have to pass through all the input at least. If the case is opposite i.e. the length of s is more then the summar length of the substrings your best option would be aho-corasick.

Hope this helps.

like image 22
Ivaylo Strandjev Avatar answered Oct 15 '22 01:10

Ivaylo Strandjev


Create a regular expression .*(S1|S2|...|Sn).* and construct its minimal DFA.

Run your query string through the DFA.

like image 35
Dimple Kapadia Avatar answered Oct 15 '22 01:10

Dimple Kapadia