Print 10 most frequently occurring words of a text that including and excluding stopwords

Tags:

I got the question from here with my changes. I have following code:

from nltk.corpus import stopwords
def content_text(text):
    stopwords = nltk.corpus.stopwords.words('english')
    content = [w for w in text if w.lower() in stopwords]
    return content

How can I print the 10 most frequently occurring words of a text that 1)including and 2)excluding stopwords?

954

asked Feb 08 '15 10:02

user2064809

1 Answers

There is a FreqDist function in nltk

import nltk
allWords = nltk.tokenize.word_tokenize(text)
allWordDist = nltk.FreqDist(w.lower() for w in allWords)

stopwords = nltk.corpus.stopwords.words('english')
allWordExceptStopDist = nltk.FreqDist(w.lower() for w in allWords if w not in stopwords)

to extract 10 most common:

mostCommon= allWordDist.most_common(10).keys()

139

answered Oct 13 '22 22:10

igorushi

Related questions
                            
                                PyImport_Import fails (returns NULL)
                            
                                How can we get tweets from specific country
                            
                                Linear regression with pandas dataframe
                            
                                matplotlib plot set x_ticks
                            
                                Case Insensitive Python string split() method
                            
                                Change python mro at runtime
                            
                                How can I call super() so it's compatible in 2 and 3?
                            
                                Finding index of maximum value in array with NumPy
                            
                                Check element type in BeautifulSoup 3
                            
                                Convert list of strings to dictionary
                            
                                How do I scrape pages with dynamically generated URLs using Python?
                            
                                How to set the redis timeout waiting for the response with pipeline in redis-py?
                            
                                Flask-MongoEngine & PyMongo Aggregation Query
                            
                                Is there an opposite / inverse to numpy.pad() function?
                            
                                Matplot: How to plot true/false or active/deactive data?
                            
                                Matplotlib : What is the function of cmap in imshow?
                            
                                opencv rectangle with dotted or dashed lines
                            
                                Convert an image to 2D array in python
                            
                                How to use select_related with GenericForeignKey in django?
                            
                                How does conda work internally?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Print 10 most frequently occurring words of a text that including and excluding stopwords

Tags:

python

nltk

word-frequency

find-occurrences

user2064809

People also ask

1 Answers

igorushi

Recent Activity

Donate For Us