What is the best algorithm for closest word.
Possible word dictionary is given and first characters in the input word can be wrong.
One option is BK-trees - see my blog post about them here. Another, faster but more complex option is Levenshtein Automata, which I've also written about, here.
There are tools such as HunSpell (open-source spell-checker widely including OpenOffice) which have approached the problem from multiple perspectives. One widely used criterion for deciding how close the words are is Levenshtein distance which is also used in HunSpell.
You could use BLAST
and modify it to use the fact that words in a dictionary are discrete units which makes the process of matching more specific unlike a long DNA string.
BLAST already has built into it the notion of edit distances.
Alternatively, you could use suffix trees (Dan Gusfeld has an excellent book on basic string matching algorithms) and build in the idea of edit distances in.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With