Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Spell Checker

I need a spell checker in python. I've looked at previous answers and they all seem to be outdated now or not applicable:

Python spell checker using a trie This question is more about the data structure.

Python Spell Checker This is a spelling corrector, given two strings.

http://norvig.com/spell-correct.html Often referenced and quite interesting, but also a spelling corrector, and accuracy isn't quite good enough, though I'll probably use this in combination with an checker.

Spell Checker for Python Uses pyenchant which isn't maintained anymore.

Python: check whether a word is spelled correctly Also suggests Pyenchant which isn't maintained.

Some details of what I need:

  • A function that accepts a string (word) and returns a boolean whether the word is valid English of not. The unit test would want True on an input of "car" and False on an input of "ijjk".
  • Accuracy needs to be above 90%, but not higher than that. I'm just using this to exclude words during preprocessing for document classification. Most of the errors will be picked up anyway as words that appear too seldom (though not all.). Spell correcting won't work in all cases because a lot of the errors are OCR issues that are too far off to fix.
  • If it can deal with legal terms that would be a big plus. Otherwise I might need to manually add certain terms to the dictionary.

What's the best approach here? Are there any maintained libraries? Do I need to download a dictionary and check against it?

like image 275
Neil Avatar asked Feb 07 '26 00:02

Neil


1 Answers

2 recent Python libraries, both based on Levenshtein minimum edit distance optimized for the task:

  • symspellpy released in the end of 2019 and
  • spello released in 2020

It should be mentioned that the symspellpy link above is the Python port of the original SymSpell C# implementation its description is here. The original SymSpell Github repository includes a dictionary with word frequencies.

Spello includes a basic pre-trained model on 30K news and 30K Wikipedia articles. But it's better to train it on your custom corpus from your domain.

like image 61
denis_smyslov Avatar answered Feb 08 '26 13:02

denis_smyslov



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!