How can you implement the "Did you mean: " like Google does in some search queries?
PS: I am using sphinx in my product. Can you suggest how can I implement this. Any guides or suggestions for some other search engines who has this functionality are most welcomed. I am using rails2.3.8, if that helps
One Solution can be:
Make a dictionary of known "keywords" or "phrases", and in search action if nothing is found then run a secondary query in that dictionary. Update that dictionary whenever a searchable entry is created say, a blog post or username.
query = "supreman"
dictionary = ["superman", "batman", "hanuman" ...] (in DB table)
search(query)
if no results, then
search in dictionary (where "keyword" LIKE query or "phrase" LIKE query) => "superman"
Check in sphinx or solr documentation. They might have a better implementation of this "Like" query which returns a % match.
But the point is how to make it efficient?
Have a look at the Damerau-Levenshtein distance algorithm. It calculates the "distance" between two strings and determines how many steps it takes to transform one string into another. The less steps the closer the two strings are.
This article shows the algorithm implemented as a MySQL stored function.
The algorithm is so much better than LIKE or SOUNDEX.
I believe Google uses crowd sourced data rather than an algorithm. ie if a user types in abcd, clicks on the back button and then immediately searches for abd then it establishes a relationship between the two search terms as the user wasn't happy with the results. Once you have a very large community searching then the pattern appears.
You should take a look at the actual theory of how Google implements something like this: How to Write a Spelling Corrector.
Although that article is written in Python, there are links to implementations in other languages at the bottom of the article. Here is a Ruby implementation.
I think you're looking for a string match algorithms.
I remember mislav's gist used to raise errors when initialize was slightly misspelled. That might be a good read.
Also, take a look at some of the articles he suggests:
Now a days did you mean feature is implemented based on phonetic spell corrector. When we misspell we generally write phonetically similar words. Based on this idea phonetic spell corrector searches its database for the most similar word. Similarity ties are broken using context(for a multi-word query other words also help in deciding the correct word) and popularity of the word. If two words are phonetically very close to the misspelled word than the word which fits the context and is more frequently used in daily life is chosen.
this is working for me:
SELECT * FROM table_name WHERE soundex(field_name) LIKE CONCAT('%', soundex('searching_element'), '%')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With