I have a TSV (tab-separated value) file that I need to spell-check for misspellings and combined words (ie 'I love you' vs 'Iloveyou').
I've installed Aspell on my machine and can run it through R using the aspell() function.
files <- "train2.tsv"
res <- aspell(files)
str(res)
summary(res)
However, the output from running it in R is just a list of misspelled words and possible suggestions.
> summary(res)
Possibly mis-spelled words:
[1] "amant" "contaneir" "creat" "ddition" "EssaySet" "EssayText" "experiament" "expireiment" "expirement"
[10] "Fipst" "infomation" "Inorder" "measureing" "mintued" "neccisary" "officialy" "renuminering" "rinsen"
[19] "sticlenx" "sucessfully" "tipe" "vineager" "vinigar" "yar"
> str(res)
Classes ‘aspell’ and 'data.frame': 27 obs. of 5 variables:
$ Original : chr "EssaySet" "EssayText" "expirement" "expireiment" ...
$ File : chr "train2.tsv" "train2.tsv" "train2.tsv" "train2.tsv" ...
$ Line : int 1 1 3 3 3 3 3 3 6 6 ...
$ Column : int 4 27 27 108 132 222 226 280 120 156 ...
$ Suggestions:List of 27
..$ : chr "Essay Set" "Essay-Set" "Essayist" "Essays" ...
..$ : chr "Essay Text" "Essay-Text" "Essayist" "Sedatest" ...
..$ : chr "experiment" "excrement" "excitement" "experiments" ...
..$ : chr "experiment" "experiments" "experimenter" "excrement" ...
..$ : chr "Amandy" "am ant" "am-ant" "Amanda" ...
..$ : chr "year" "ya" "Yard" "yard" ...
Is there are way to have aspell (or any other spellchecker) automatically correct misspelled words?
It looks like you can do the following:
s = load_up_users_dictionary()
for word in text_to_check:
if word not in s:
new_words = s.suggest( word )
replace_incorrect_word( word, new_words[0] )#Pick the first word from the returned list.
Just a quick glance over the documentation and that looks like what you would have to do to automatically use the suggested correct spelling.
http://0x80.pl/proj/aspell-python/index-c.html
Edit: Realize that you may not be looking for python code, but this would be the easiest way to do it with python as the question was tagged with python. There is probably a more efficient method of doing it, but it's getting late and this came to mind first.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With