Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Build an index for substring search?

I want to do general substring search among billions of strings. The requirement is a little different from general fulltext search because I want a query "ubst" also can hit "substr".

Is Lucene or Sphinx capable of doing this? If not, what's the best way do you think to do this?

like image 650
Chen Xing Avatar asked Jul 27 '11 02:07

Chen Xing


People also ask

Is there indexing for strings?

Because strings, like lists and tuples, are a sequence-based data type, it can be accessed through indexing and slicing.

How do you find the index of a substring in a string in Python?

Python String find() method returns the lowest index or first occurrence of the substring if it is found in a given string. If it is not found, then it returns -1.

How do I find the index of a word in a string?

The indexOf() method returns the position of the first occurrence of specified character(s) in a string. Tip: Use the lastIndexOf method to return the position of the last occurrence of specified character(s) in a string.


1 Answers

Best index structure for this case is suffix tree Lucene does not implements this type of index so its substring search is slow. But lucene has prefix tree index which mean you can do fast search if you search terms by their prefix.

like image 118
yura Avatar answered Sep 25 '22 15:09

yura