NLTK in python has a function FreqDist which gives you the frequency of words within a text. I am trying to pass my text as an argument but the result is of the form: <code>[' ', 'e', 'a', 'o', 'n', 'i', 't', 'r', 's', 'l', 'd', 'h', 'c', 'y', 'b', 'u', 'g', '\n', 'm', 'p', 'w', 'f', ',', 'v', '.', "'", 'k', 'B', '"', 'M', 'H', '9', 'C', '-', 'N', 'S', '1', 'A', 'G', 'P', 'T', 'W', '[', ']', '(', ')', '0', '7', 'E', 'J', 'O', 'R', 'j', 'x']</code> whereas in the example in the NLTK website the result was whole words not just letters. Im doing it this way: <pre class="prettyprint"><code>file_y = open(fileurl) p = file_y.read() fdist = FreqDist(p) vocab = fdist.keys() vocab[:100] </code></pre> DO you know what I have wrong pls? Thanks!

<code>FreqDist</code> expects an iterable of tokens. A string is iterable --- the iterator yields every character. Pass your text to a tokenizer first, and pass the tokens to <code>FreqDist</code>.

FreqDist with NLTK

Tags:

python

nlp

nltk

NLTK in python has a function FreqDist which gives you the frequency of words within a text. I am trying to pass my text as an argument but the result is of the form:

[' ', 'e', 'a', 'o', 'n', 'i', 't', 'r', 's', 'l', 'd', 'h', 'c', 'y', 'b', 'u', 'g', '\n', 'm', 'p', 'w', 'f', ',', 'v', '.', "'", 'k', 'B', '"', 'M', 'H', '9', 'C', '-', 'N', 'S', '1', 'A', 'G', 'P', 'T', 'W', '[', ']', '(', ')', '0', '7', 'E', 'J', 'O', 'R', 'j', 'x']

whereas in the example in the NLTK website the result was whole words not just letters. Im doing it this way:

file_y = open(fileurl) p = file_y.read() fdist = FreqDist(p) vocab = fdist.keys() vocab[:100]

DO you know what I have wrong pls? Thanks!

528

asked Jan 08 '11 16:01

afg102

1 Answers

FreqDist expects an iterable of tokens. A string is iterable --- the iterator yields every character.

Pass your text to a tokenizer first, and pass the tokens to FreqDist.

answered Oct 10 '22 21:10

Alex Brasetvik

Related questions
                            
                                Are Python instance variables thread-safe?
                            
                                Python 2.6: Class inside a Class?
                            
                                cross-platform splitting of path in python
                            
                                When are create and update called in djangorestframework serializer?
                            
                                How can I traverse a file system with a generator?
                            
                                Search and get a line in Python
                            
                                How can I remove duplicate words in a string with Python?
                            
                                How to download a youtube video using the youtube's API?
                            
                                Saving openpyxl file via text and filestream
                            
                                Most efficient way to search the last X lines of a file?
                            
                                How to modify bits in an integer?
                            
                                mac - pip install pymssql error
                            
                                Install pycairo in virtualenv
                            
                                How to change a figure's size in Python Seaborn package
                            
                                Native Python function to remove NoneType elements from list?
                            
                                How to print a list with integers without the brackets, commas and no quotes? [duplicate]
                            
                                How do I write a decorator that restores the cwd?
                            
                                Glob search files in date order?
                            
                                How to return 400 (Bad Request) on Flask?
                            
                                Django: Adding CSS classes when rendering form fields in a template

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With