I'm implementing search suggestion functionality in my web-app, and have been looking at existing implementations for techniques in use.
It seems as though most of the major sites (Amazon, Bing, etc.) implement fuzzy search in the following way:
Tokenize search string in to terms
processingSearchStringSet = {}
For each term
if exact term is NOT in index
Get possible terms (fuzzyTerms) from levenshtein(term, 1 (or 2))
For each term in fuzzyTerms
if term is in index
processingSearchStringSet.intersect(stringsIndexedByTermsSet)
else
processingSearchStringSet.intersect(stringsIndexedByTermsSet)
The result set members are then presumably ranked by metrics (ex: term order preserval, absolute term location, search popularity) and preserved or eliminated based on this ranking and a pre-determined result set size before being delivered back to the user.
Google's implementation on the other hand, differs quite a bit from this.
Specifically, it allows more than 1 error in the search string's constituent terms. The error threshhold seems to be dependant on where the term of interest is in the string, though it never exceeds 7.
What's interesting is that:
N-grams also don't see to be in use: modifying a term so that it doesn't contain an bigram present in the original term does not seem to affect the result(s).
Here's an example to illustrate my findings:
Example term: "Fiftyyyy shades of grey"
Amazon suggestions: none
(if the error count exceeds 1 on any term, the search fails)
Bing suggestions: none
(if the error count exceeds 2 on any term, the search fails)
Google suggestions: 10 (max)
(breaking the search would require 5 or more errors on any single term,
or multiple errors on multiple terms)
My question is: what type of sorcery is at work here? Are they just using a Levenshtein search with a huge error allowance, or do they use another technique I am unaware of?
Maybe you should try this approach: http://norvig.com/spell-correct.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With