Assuming that a general Trie of dictionary words is built, what would be the best method to check for the 4 cases of spelling mistakes - substitution, deletion, transposition and insertion during traversal?
One method is to figure out all the words within n edit distances of a given word and then checking for them in the Trie. This isn't a bad option, but a better intuition here seems to be use a dynamic programming (or a recursive equivalent) method to determine the best sub-tries after having modified the words during traversal.
Any ideas would be welcome!
PS, would appreciate actual inputs rather than just links in answers.
Spell checkers can use approximate string matching algorithms such as Levenshtein distance to find correct spellings of misspelled words. An alternative type of spell checker uses solely statistical information, such as n-grams, to recognize errors instead of correctly-spelled words.
On the Review tab, click Spelling & Grammar. If Word finds a potential error, the Spelling & Grammar dialog box will open, spelling errors will be shown as red text, and grammatical errors will be shown as green text. To fix an error, do one of the following: Type the correction in the box and then click Change.
GrammarlyThe suggestion tool allows finding mistakes and choosing the correct spelling of a particular word.
Automatic spelling correction is important for many NLP applications like web search engines, text summarization, sentiment analysis etc. Most approaches use parallel data of noisy and correct word mappings from different sources as training data for automatic spelling correction.
I actually wrote some code to do this the other day:
https://bitbucket.org/teoryn/spell-checker/src/tip/spell_checker.py
It's based on the code by Peter Norvig (http://norvig.com/spell-correct.html) but stores the dictionary in a trie for finding words within a given edit distance faster.
The algorithm walks the trie recursively applying the possible edits (or not) at each step along the way by consuming letters from the input word. A parameter to the recursive call states how many more edits can be made. The trie helps narrow the search space by checking which letters can actually be reached from our given prefix. For example, when inserting a character, instead of adding each letter in the alphabet, we only add letters that are reachable from the current node. Not making an edit is equivalent to taking the branch from the current node in the trie along the current letter from the input word. If that branch is not there then we can backtrack and avoid searching a possibly large space where no real words could be found.
I think you can do this with a straightforward breadth-first search on the tree: choose a threshold of the number of errors you are looking for, simply run through the letters of the word to be matched one at a time, generating a set of (prefix, subtrie) pairs reached so far matching the prefix, and while you are beneath your error threshold, add to your set of next subgoals:
This seems pretty naive: is there a problem with this that led you to think of dynamic programming?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With