I want to do general substring search among billions of strings. The requirement is a little different from general fulltext search because I want a query "ubst" also can hit "substr".
Is Lucene or Sphinx capable of doing this? If not, what's the best way do you think to do this?
Because strings, like lists and tuples, are a sequence-based data type, it can be accessed through indexing and slicing.
Python String find() method returns the lowest index or first occurrence of the substring if it is found in a given string. If it is not found, then it returns -1.
The indexOf() method returns the position of the first occurrence of specified character(s) in a string. Tip: Use the lastIndexOf method to return the position of the last occurrence of specified character(s) in a string.
Best index structure for this case is suffix tree Lucene does not implements this type of index so its substring search is slow. But lucene has prefix tree index which mean you can do fast search if you search terms by their prefix.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With