I n my application i have upt o millions of short strings (mostly shorter than 32 characters). I want to implement a search box with a attached list that contains only elements that contain the whole string entered in the search box. How can i prebuild a index to find such strings fast? All sorted STL containers check the whole string.
For a entered search string "str" i need to find all strings containg "str": "main street", "struve", "ustr" etc.
You can build a Permuterm indexes.
For "struve" you would insert into a Radix tree (or a general purpose search tree):
struve$
truve$s
ruve$st
uve$str
ve$stru
e$struv
$struve
To search the infix you would search from the root node for matching prefix strings.
You might start by looking at trie's. Although they are mostly used as prefix trees, the data structure itself may be adapted for faster general search.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With