Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Autocorrect Spell Checker

I have a TSV (tab-separated value) file that I need to spell-check for misspellings and combined words (ie 'I love you' vs 'Iloveyou').

I've installed Aspell on my machine and can run it through R using the aspell() function.

files <- "train2.tsv"
 res <- aspell(files)
 str(res)
 summary(res)

However, the output from running it in R is just a list of misspelled words and possible suggestions.

>  summary(res)
Possibly mis-spelled words:
 [1] "amant"        "contaneir"    "creat"        "ddition"      "EssaySet"     "EssayText"    "experiament"  "expireiment"  "expirement"  
[10] "Fipst"        "infomation"   "Inorder"      "measureing"   "mintued"      "neccisary"    "officialy"    "renuminering" "rinsen"      
[19] "sticlenx"     "sucessfully"  "tipe"         "vineager"     "vinigar"      "yar"   

>  str(res)
Classes ‘aspell’ and 'data.frame':      27 obs. of  5 variables:
 $ Original   : chr  "EssaySet" "EssayText" "expirement" "expireiment" ...
 $ File       : chr  "train2.tsv" "train2.tsv" "train2.tsv" "train2.tsv" ...
 $ Line       : int  1 1 3 3 3 3 3 3 6 6 ...
 $ Column     : int  4 27 27 108 132 222 226 280 120 156 ...
 $ Suggestions:List of 27
  ..$ : chr  "Essay Set" "Essay-Set" "Essayist" "Essays" ...
  ..$ : chr  "Essay Text" "Essay-Text" "Essayist" "Sedatest" ...
  ..$ : chr  "experiment" "excrement" "excitement" "experiments" ...
  ..$ : chr  "experiment" "experiments" "experimenter" "excrement" ...
  ..$ : chr  "Amandy" "am ant" "am-ant" "Amanda" ...
  ..$ : chr  "year" "ya" "Yard" "yard" ...

Is there are way to have aspell (or any other spellchecker) automatically correct misspelled words?

like image 447
screechOwl Avatar asked Jul 07 '12 06:07

screechOwl


1 Answers

It looks like you can do the following:

s = load_up_users_dictionary()

for word in text_to_check:
    if word not in s:
        new_words = s.suggest( word )
        replace_incorrect_word( word, new_words[0] )#Pick the first word from the returned list.

Just a quick glance over the documentation and that looks like what you would have to do to automatically use the suggested correct spelling.

http://0x80.pl/proj/aspell-python/index-c.html

Edit: Realize that you may not be looking for python code, but this would be the easiest way to do it with python as the question was tagged with python. There is probably a more efficient method of doing it, but it's getting late and this came to mind first.

like image 111
sean Avatar answered Nov 03 '22 00:11

sean