NLTK in python has a function FreqDist which gives you the frequency of words within a text. I am trying to pass my text as an argument but the result is of the form:
[' ', 'e', 'a', 'o', 'n', 'i', 't', 'r', 's', 'l', 'd', 'h', 'c', 'y', 'b', 'u', 'g', '\n', 'm', 'p', 'w', 'f', ',', 'v', '.', "'", 'k', 'B', '"', 'M', 'H', '9', 'C', '-', 'N', 'S', '1', 'A', 'G', 'P', 'T', 'W', '[', ']', '(', ')', '0', '7', 'E', 'J', 'O', 'R', 'j', 'x']
whereas in the example in the NLTK website the result was whole words not just letters. Im doing it this way:
file_y = open(fileurl) p = file_y.read() fdist = FreqDist(p) vocab = fdist.keys() vocab[:100]
DO you know what I have wrong pls? Thanks!
The FreqDist function gives the user the frequency distribution of all the words in the text. This is particularly helpful in probability calculations, where frequency distribution counts the number of times that each outcome of an experiment occurs.
Once after computing conditional frequency distribution, tabulate method is used for viewing the count along with arguments conditions and samples.
FreqDist
expects an iterable of tokens. A string is iterable --- the iterator yields every character.
Pass your text to a tokenizer first, and pass the tokens to FreqDist
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With