Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a corpora of English words in nltk?

Tags:

nltk

Is there any way to get the list of English words in python nltk library? I tried to find it but the only thing I have found is wordnet from nltk.corpus. But based on documentation, it does not have what I need (it finds synonyms for a word).

I know how to find the list of this words by myself (this answer covers it in details), so I am interested whether I can do this by only using nltk library.

like image 239
Salvador Dali Avatar asked Feb 05 '15 08:02

Salvador Dali


People also ask

What are NLTK corpora?

The nltk.corpus package defines a collection of corpus reader classes, which can be used to access the contents of a diverse set of corpora. The list of available corpora is given at: https://www.nltk.org/nltk_data/ Each corpus reader class is specialized to handle a specific corpus format.

What is NLTK corpus used for?

In corpus linguistics, they are used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory. Each corpus reader class is specialized to handle a specific corpus format. In addition, the nltk.

How do you create a text corpus in Python?

Finally, to read a directory of texts and create an NLTK corpus in another languages, you must first ensure that you have a python-callable word tokenization and sentence tokenization modules that takes string/basestring input and produces such output: >>> from nltk.

How many words are in NLTK words?

By default, NLTK (Natural Language Toolkit) includes a list of 40 stop words, including: “a”, “an”, “the”, “of”, “in”, etc.


1 Answers

Yes, from nltk.corpus import words

And check using:

>>> "fine" in words.words() True 

Reference: Section 4.1 (Wordlist Corpora), chapter 2 of Natural Language Processing with Python.

like image 56
axiom Avatar answered Sep 22 '22 15:09

axiom



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!