Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A StringToken Parser which gives Google Search style "Did you mean:" Suggestions

Seeking a method to:

Take whitespace separated tokens in a String; return a suggested Word


ie:
Google Search can take "fonetic wrd nterpreterr",
and atop of the result page it shows "Did you mean: phonetic word interpreter"

A solution in any of the C* languages or Java would be preferred.


Are there any existing Open Libraries which perform such functionality?

Or is there a way to Utilise a Google API to request a suggested word?

like image 644
Ande Turner Avatar asked Sep 25 '08 20:09

Ande Turner


1 Answers

In his article How to Write a Spelling Corrector, Peter Norvig discusses how a Google-like spellchecker could be implemented. The article contains a 20-line implementation in Python, as well as links to several reimplementations in C, C++, C# and Java. Here is an excerpt:

The full details of an industrial-strength spell corrector like Google's would be more confusing than enlightening, but I figured that on the plane flight home, in less than a page of code, I could write a toy spelling corrector that achieves 80 or 90% accuracy at a processing speed of at least 10 words per second.

Using Norvig's code and this text as training set, i get the following results:

>>> import spellch
>>> [spellch.correct(w) for w in 'fonetic wrd nterpreterr'.split()]
['phonetic', 'word', 'interpreters']
like image 82
Constantin Avatar answered Sep 30 '22 20:09

Constantin