I'm unsure if I've understood correctly how the FreqDist functions works on Python. As I am following a tutorial, I am led to believe that the following code constructs a frequency distribution for a given list of words and calculates the top x frequently used words. (In the example below let corpus be an NLTK corpus and file to be a filename of a file in that corpus)
words = corpus.words('file.txt')
fd_words = nltk.FreqDist(word.lower() for word in words)
fd_words.items()[:x]
However, when I go through the following commands on Python, it seems to suggest otherwise:
>>> from nltk import *
>>> fdist = FreqDist(['hi','my','name','is','my','name'])
>>> fdist
FreqDist({'my': 2, 'name':2, 'is':1, 'hi':1}
>>> fdist.items()
[('is',1),('hi',1),('my',2),('name',2)]
>>> fdist.items[:2]
[('is',1),('hi',1)]
The fdist.items()[:x] method is in fact returning the x least common words?
Can someone tell me if I have done something wrong or if the mistake lies in the tutorial I am following?
A frequency distribution records the number of times each outcome of an experi- ment has occured. For example, a frequency distribution could be used to record the frequency of each word type in a document. Frequency distributions are encoded by the FreqDist class, which is defined by the nltk. probability module.
1.1 Gutenberg Corpus NLTK includes a small selection of texts from the Project Gutenberg electronic text archive, which contains some 25,000 free electronic books, hosted at http://www.gutenberg.org/.
A corpus can be defined as a collection of text documents. It can be thought as just a bunch of text files in a directory, often alongside many other directories of text files. How it is done ? NLTK already defines a list of data paths or directories in nltk.
By default a FreqDist
is not sorted. I think you are looking for most_common
method:
from nltk import FreqDist
fdist = FreqDist(['hi','my','name','is','my','name'])
fdist.most_common(2)
Returns:
[('my', 2), ('name', 2)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With