How to check if a word is an English word with Python?

People also ask

How do you know if a word is English?

If you are using the word while talking to people who speak the same as you and they understand it as an English word then it is an English word, in your dialect. If you hear people using it in another dialect too, then it has a broader appeal.

Is there an English dictionary in Python?

PyDictionary: A "Real" Dictionary Module for Python PyDictionary is a Dictionary Module for Python 2/3 to get meanings, translations, synonyms and Antonyms of words. It uses WordNet for getting meanings, Google for translations, and synonym.com for getting synonyms and antonyms.

What is WordNet Python?

The WordNet is a part of Python's Natural Language Toolkit. It is a large word database of English Nouns, Adjectives, Adverbs and Verbs. These are grouped into some set of cognitive synonyms, which are called synsets. To use the Wordnet, at first we have to install the NLTK module, then download the WordNet package.

What is PyDictionary?

PyDictionary is a Python Module that helps to get meaning translations, antonyms and synonyms of words. It uses WordNet for getting meanings, Google for translations, and synonym.com for getting synonyms and antonyms. PyDictionary uses BeautifulSoup, Requests module as the dependencies.

For (much) more power and flexibility, use a dedicated spellchecking library like PyEnchant. There's a tutorial, or you could just dive straight in:

>>> import enchant
>>> d = enchant.Dict("en_US")
>>> d.check("Hello")
True
>>> d.check("Helo")
False
>>> d.suggest("Helo")
['He lo', 'He-lo', 'Hello', 'Helot', 'Help', 'Halo', 'Hell', 'Held', 'Helm', 'Hero', "He'll"]
>>>

PyEnchant comes with a few dictionaries (en_GB, en_US, de_DE, fr_FR), but can use any of the OpenOffice ones if you want more languages.

There appears to be a pluralisation library called inflect, but I've no idea whether it's any good.

It won't work well with WordNet, because WordNet does not contain all english words. Another possibility based on NLTK without enchant is NLTK's words corpus

>>> from nltk.corpus import words
>>> "would" in words.words()
True
>>> "could" in words.words()
True
>>> "should" in words.words()
True
>>> "I" in words.words()
True
>>> "you" in words.words()
True

Using NLTK:

from nltk.corpus import wordnet

if not wordnet.synsets(word_to_test):
  #Not an English Word
else:
  #English Word

You should refer to this article if you have trouble installing wordnet or want to try other approaches.

Using a set to store the word list because looking them up will be faster:

with open("english_words.txt") as word_file:
    english_words = set(word.strip().lower() for word in word_file)

def is_english_word(word):
    return word.lower() in english_words

print is_english_word("ham")  # should be true if you have a good english_words.txt

To answer the second part of the question, the plurals would already be in a good word list, but if you wanted to specifically exclude those from the list for some reason, you could indeed write a function to handle it. But English pluralization rules are tricky enough that I'd just include the plurals in the word list to begin with.

As to where to find English word lists, I found several just by Googling "English word list". Here is one: http://www.sil.org/linguistics/wordlists/english/wordlist/wordsEn.txt You could Google for British or American English if you want specifically one of those dialects.

For a faster NLTK-based solution you could hash the set of words to avoid a linear search.

from nltk.corpus import words as nltk_words
def is_english_word(word):
    # creation of this dictionary would be done outside of 
    #     the function because you only need to do it once.
    dictionary = dict.fromkeys(nltk_words.words(), None)
    try:
        x = dictionary[word]
        return True
    except KeyError:
        return False

Related questions
                            
                                Python assigning multiple variables to same value? list behavior
                            
                                Why use Python's os module methods instead of executing shell commands directly?
                            
                                Change a Django form field to a hidden field
                            
                                Maven equivalent for python [closed]
                            
                                Python ElementTree module: How to ignore the namespace of XML files to locate matching element when using the method "find", "findall"
                            
                                Python string prints as [u'String']
                            
                                Get selected subcommand with argparse
                            
                                How to copy a 2D array into a 3rd dimension, N times?
                            
                                TypeError: 'dict_keys' object does not support indexing
                            
                                Use .corr to get the correlation between two columns
                            
                                Can you define aliases for imported modules in Python?
                            
                                initialize a numpy array
                            
                                Why is the order in dictionaries and sets arbitrary?
                            
                                Python: Using .format() on a Unicode-escaped string
                            
                                Matplotlib: draw grid lines behind other graph elements
                            
                                Why are Python's arrays slow?
                            
                                Call a function with argument list in python
                            
                                Does 'finally' always execute in Python?
                            
                                Check if value already exists within list of dictionaries?
                            
                                Changing default encoding of Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to check if a word is an English word with Python?

Tags:

python

nltk

wordnet

People also ask

Recent Activity

Donate For Us