Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the best algorithm for closest word

Tags:

algorithm

What is the best algorithm for closest word.

Possible word dictionary is given and first characters in the input word can be wrong.

like image 229
Avinash Avatar asked Aug 30 '10 19:08

Avinash


3 Answers

One option is BK-trees - see my blog post about them here. Another, faster but more complex option is Levenshtein Automata, which I've also written about, here.

like image 89
Nick Johnson Avatar answered Sep 22 '22 17:09

Nick Johnson


There are tools such as HunSpell (open-source spell-checker widely including OpenOffice) which have approached the problem from multiple perspectives. One widely used criterion for deciding how close the words are is Levenshtein distance which is also used in HunSpell.

like image 28
Leonid Avatar answered Sep 25 '22 17:09

Leonid


You could use BLAST

and modify it to use the fact that words in a dictionary are discrete units which makes the process of matching more specific unlike a long DNA string.

BLAST already has built into it the notion of edit distances.

Alternatively, you could use suffix trees (Dan Gusfeld has an excellent book on basic string matching algorithms) and build in the idea of edit distances in.

like image 20
venky Avatar answered Sep 23 '22 17:09

venky