Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fast index for "contains string"

Tags:

c++

algorithm

stl

I n my application i have upt o millions of short strings (mostly shorter than 32 characters). I want to implement a search box with a attached list that contains only elements that contain the whole string entered in the search box. How can i prebuild a index to find such strings fast? All sorted STL containers check the whole string.

For a entered search string "str" i need to find all strings containg "str": "main street", "struve", "ustr" etc.

like image 469
RED SOFT ADAIR Avatar asked Jan 18 '10 10:01

RED SOFT ADAIR


2 Answers

You can build a Permuterm indexes.

For "struve" you would insert into a Radix tree (or a general purpose search tree):

struve$
truve$s
ruve$st
uve$str
ve$stru
e$struv
$struve

To search the infix you would search from the root node for matching prefix strings.

like image 128
Thomas Jung Avatar answered Oct 12 '22 13:10

Thomas Jung


You might start by looking at trie's. Although they are mostly used as prefix trees, the data structure itself may be adapted for faster general search.

like image 20
Kornel Kisielewicz Avatar answered Oct 12 '22 13:10

Kornel Kisielewicz