I am looking for a proper solution to this question. This question has been asked many times before and I didn't find a single answer that suited. I need to use a corpus in NLTK to detect whether a word is an English word I have tried to do : <pre class="prettyprint"><code>wordnet.synsets(word) </code></pre> This doesn't work for many common words. Using a list of words in English and performing lookup in a file is not an option. Using enchant is not an option either. If there is another library that can do the same, please provide the usage of the api. If not, please provide a corpus in nltk which has all the words in English.

<blockquote> NLTK includes some corpora that are nothing more than wordlists. The Words Corpus is the /usr/share/dict/words file from Unix, used by some spell checkers. We can use it to find unusual or mis-spelt words in a text corpus, as shown in : </blockquote> <pre class="prettyprint"><code>def unusual_words(text): text_vocab = set(w.lower() for w in text.split() if w.isalpha()) english_vocab = set(w.lower() for w in nltk.corpus.words.words()) unusual = text_vocab - english_vocab return sorted(unusual) </code></pre> And in this case you can check the member ship of your word with <code>english_vocab</code>. <pre class="prettyprint"><code>>>> import nltk >>> english_vocab = set(w.lower() for w in nltk.corpus.words.words()) >>> 'a' in english_vocab True >>> 'this' in english_vocab True >>> 'nothing' in english_vocab True >>> 'nothingg' in english_vocab False >>> 'corpus' in english_vocab True >>> 'Terminology'.lower() in english_vocab True >>> 'sorted' in english_vocab True </code></pre>

I tried the above approach but for many words which should exist so I tried wordnet. I think this have more comprehensive vacabulary.- <code>from nltk.corpus import wordnet if wordnet.synsets(word): #Do something else: #Do some otherthing</code>

How to find out wether a word exists in english using nltk

Tags:

python

python-3.x

nlp

nltk

wordnet

I am looking for a proper solution to this question. This question has been asked many times before and I didn't find a single answer that suited. I need to use a corpus in NLTK to detect whether a word is an English word

I have tried to do :

wordnet.synsets(word)

This doesn't work for many common words. Using a list of words in English and performing lookup in a file is not an option. Using enchant is not an option either. If there is another library that can do the same, please provide the usage of the api. If not, please provide a corpus in nltk which has all the words in English.

893

asked Mar 17 '15 12:03

akshitBhatia

2 Answers

NLTK includes some corpora that are nothing more than wordlists. The Words Corpus is the /usr/share/dict/words file from Unix, used by some spell checkers. We can use it to find unusual or mis-spelt words in a text corpus, as shown in :

def unusual_words(text):
    text_vocab = set(w.lower() for w in text.split() if w.isalpha())
    english_vocab = set(w.lower() for w in nltk.corpus.words.words())
    unusual = text_vocab - english_vocab
    return sorted(unusual)

And in this case you can check the member ship of your word with english_vocab.

>>> import nltk
>>> english_vocab = set(w.lower() for w in nltk.corpus.words.words())
>>> 'a' in english_vocab
True
>>> 'this' in english_vocab
True
>>> 'nothing' in english_vocab
True
>>> 'nothingg' in english_vocab
False
>>> 'corpus' in english_vocab
True
>>> 'Terminology'.lower() in english_vocab
True
>>> 'sorted' in english_vocab
True

answered Sep 28 '22 21:09

Mazdak

I tried the above approach but for many words which should exist so I tried wordnet. I think this have more comprehensive vacabulary.-

from nltk.corpus import wordnet if wordnet.synsets(word): #Do something else: #Do some otherthing

answered Sep 28 '22 21:09

Saurabh Malviya

Related questions
                            
                                py2exe change application name output
                            
                                How to gridsearch over transform arguments within a pipeline in scikit-learn
                            
                                Detect if python script is run from an ipython shell, or run from the command line
                            
                                Aligning individual columns in pandas to_latex
                            
                                Internal server error Flask
                            
                                pandas unable to read from large StringIO object
                            
                                PyQt: QFileDialog.getExistingDirectory using a default directory, user independant
                            
                                Slicing n-dimensional numpy array using list of indices
                            
                                Python: Freeze dict keys after creation [duplicate]
                            
                                Colorful Python Syntax in vim?
                            
                                How to capitalize a string in Python? [duplicate]
                            
                                Permission denied when i try to execute a python script from bash? [duplicate]
                            
                                Python3 Django -> HTML to PDF
                            
                                In Python (2.7), why is os.remove not identical to os.unlink?
                            
                                Fastest way to write large CSV with Python
                            
                                Json Encoder AND Decoder for complex numpy arrays
                            
                                Can I delete the django migration files inside migrations directory
                            
                                Is there a way in python to execute all functions in a file without explicitly calling them?
                            
                                How to get around the pickling error of python multiprocessing without being in the top-level?
                            
                                List of dictionaries with comprehension in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With