I have a list of word library and a text in which there are a spell error (typos), and I want to correct the word spell error to be correct according to list of library
for example
in list of word :
listOfWord = [...,"halo","saya","sedangkan","semangat","cemooh"..];
this is my string :
string = "haaallllllooo ssya sdngkan ceemoooh , smngat semoga menyenangkan"
I want change the spellerror to be correct like :
string = "halo saya sedangkan cemooh, semangat semoga menyenangkan"
what is the best algorithm to check each word in list, because I have millions of words in the list and have many possibilities
Spell checkers can use approximate string matching algorithms such as Levenshtein distance to find correct spellings of misspelled words. An alternative type of spell checker uses solely statistical information, such as n-grams, to recognize errors instead of correctly-spelled words.
Collect a dictionary of all the possible valid words in a language, along with their frequency in the language (i.e. a unigram language model). Generate a list of candidate edits of the word you want to correct by insertions, deletions, characters permutations, and character replacement.
Symmetric Delete Spelling Correction (SymSpell) It is a simple but useful approach to correct spelling error.
It depends on how your data is stored, but you'll probably want to use a pattern matching algorithm like Aho–Corasick. Of course, that assumes your input data structure is a Trie. A Trie a very space-efficient storage container for words that may also be of interest to you (again, depending on your environment.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With