Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

FreqDist with NLTK

Tags:

python

nlp

nltk

NLTK in python has a function FreqDist which gives you the frequency of words within a text. I am trying to pass my text as an argument but the result is of the form:

[' ', 'e', 'a', 'o', 'n', 'i', 't', 'r', 's', 'l', 'd', 'h', 'c', 'y', 'b', 'u', 'g', '\n', 'm', 'p', 'w', 'f', ',', 'v', '.', "'", 'k', 'B', '"', 'M', 'H', '9', 'C', '-', 'N', 'S', '1', 'A', 'G', 'P', 'T', 'W', '[', ']', '(', ')', '0', '7', 'E', 'J', 'O', 'R', 'j', 'x']

whereas in the example in the NLTK website the result was whole words not just letters. Im doing it this way:

file_y = open(fileurl) p = file_y.read() fdist = FreqDist(p) vocab = fdist.keys() vocab[:100] 

DO you know what I have wrong pls? Thanks!

like image 528
afg102 Avatar asked Jan 08 '11 16:01

afg102


People also ask

What is FreqDist?

The FreqDist function gives the user the frequency distribution of all the words in the text. This is particularly helpful in probability calculations, where frequency distribution counts the number of times that each outcome of an experiment occurs.

Which of the following method is used to view the conditions which are used while computing conditional frequency distributions?

Once after computing conditional frequency distribution, tabulate method is used for viewing the count along with arguments conditions and samples.


1 Answers

FreqDist expects an iterable of tokens. A string is iterable --- the iterator yields every character.

Pass your text to a tokenizer first, and pass the tokens to FreqDist.

like image 70
Alex Brasetvik Avatar answered Oct 10 '22 21:10

Alex Brasetvik