Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to correct the user input (Kind of google "did you mean?")

I have the following requirement: -

I have many (say 1 million) values (names). The user will type a search string.

I don't expect the user to spell the names correctly.

So, I want to make kind of Google "Did you mean". This will list all the possible values from my datastore. There is a similar but not same question here. This did not answer my question.

My question: - 1) I think it is not advisable to store those data in RDBMS. Because then I won't have filter on the SQL queries. And I have to do full table scan. So, in this situation how the data should be stored?

2) The second question is the same as this. But, just for the completeness of my question: how do I search through the large data set? Suppose, there is a name Franky in the dataset. If a user types as Phranky, how do I match the Franky? Do I have to loop through all the names?

I came across Levenshtein Distance, which will be a good technique to find the possible strings. But again, my question is do I have to operate on all 1 million values from my data store?

3) I know, Google does it by watching users behavior. But I want to do it without watching user behavior, i.e. by using, I don't know yet, say distance algorithms. Because the former method will require large volume of searches to start with!

4) As Kirk Broadhurst pointed out in an answer below, there are two possible scenarios: -

  • Users mistyping a word (an edit distance algorithm)
  • Users not knowing a word and guessing (a phonetic match algorithm)

I am interested in both of these. They are really two separate things; e.g. Sean and Shawn sound the same but have an edit distance of 3 - too high to be considered a typo.

like image 1000
Sabya Avatar asked Aug 16 '09 17:08

Sabya


People also ask

How does Google search did you mean work?

According to Google, the search feature follows a pre-determined process : a query is initiated, the web is navigated by following links from one page to another (an operation termed web crawling), applicable pages are sorted using a set of criteria and indexes, and the most suitable results delivered which are ...

How does Google auto correct work?

Autocomplete is a feature within Google Search that makes it faster to complete searches that you start to type. Our automated systems generate predictions that help people save time by allowing them to quickly complete the search they already intended to do.

Did you mean searching algorithm?

The Did You Mean algorithm works separately on every term within the query. If a search query returns less than 50 results, the Did You Mean algorithm performs the following: Primo searches for a match in the Did You Mean index. Several candidates are checked and the highest-ranking result is used.

How does Google detect typo?

Google figures out possible misspellings and their likely correct spellings by using words it finds while searching the web and processing user queries.


1 Answers

The Soundex algorithm may help you out with this.

http://en.wikipedia.org/wiki/Soundex

You could pre-generate the soundex values for each name and store it in the database, then index that to avoid having to scan the table.

like image 85
Dave Carlile Avatar answered Sep 24 '22 12:09

Dave Carlile