Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best Algorithm to make correction typos in text

I have a list of word library and a text in which there are a spell error (typos), and I want to correct the word spell error to be correct according to list of library

for example

in list of word :

listOfWord = [...,"halo","saya","sedangkan","semangat","cemooh"..];

this is my string :

string = "haaallllllooo ssya sdngkan ceemoooh , smngat semoga menyenangkan"

I want change the spellerror to be correct like :

string = "halo saya sedangkan cemooh, semangat semoga menyenangkan"

what is the best algorithm to check each word in list, because I have millions of words in the list and have many possibilities

like image 410
PWS Avatar asked Jul 11 '17 06:07

PWS


People also ask

What algorithm does spell check use?

Spell checkers can use approximate string matching algorithms such as Levenshtein distance to find correct spellings of misspelled words. An alternative type of spell checker uses solely statistical information, such as n-grams, to recognize errors instead of correctly-spelled words.

How do you correct a spelling mistake in NLP?

Collect a dictionary of all the possible valid words in a language, along with their frequency in the language (i.e. a unigram language model). Generate a list of candidate edits of the word you want to correct by insertions, deletions, characters permutations, and character replacement.

Which approach is used for spelling error detection and correction in NLP?

Symmetric Delete Spelling Correction (SymSpell) It is a simple but useful approach to correct spelling error.


1 Answers

It depends on how your data is stored, but you'll probably want to use a pattern matching algorithm like Aho–Corasick. Of course, that assumes your input data structure is a Trie. A Trie a very space-efficient storage container for words that may also be of interest to you (again, depending on your environment.)

like image 122
Josh Avatar answered Sep 27 '22 16:09

Josh