Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: word count from WordCloud

I am using WordCloud on a body of text, and I would like to see the actual counts for each word in the cloud. I can see the weighted frequencies using .words_ but I was wondering if there is an easy way to see the actual counts?

# Generate a word cloud image
wordcloud = WordCloud(background_color="white").generate(text)
wordfreq = wordcloud.words_

Edit: the reason I would like to be able to see the word counts from the WordCloud (versus just finding word counts from the text myself) is because WordCloud includes phrases (collocations) as well as single words in its analysis. So, for example, a count of "water resources" would appear, as well as a count of the word "water" when it does not appear in "water resources." WordCloud also appears to add instances of words that appear in plural form to the count of the word as a singular (e.g. counting "water resources" in the count of "water resource").

like image 763
Lauren D Avatar asked Sep 03 '25 14:09

Lauren D


1 Answers

Just use WordCloud().process_text(text):

>>> WordCloud().process_text('penn penn penn penn penn state state state state uni uni uni college college university states vice president vice president vice president vice president vice president vice president vice president')
{'penn': 5, 'state': 5, 'uni': 3, 'college': 2, 'university': 1, 'vice president': 7}

Notice that it combines "states" into the "state" count and also counts "vice president" as a bigram.

like image 114
nrubin29 Avatar answered Sep 05 '25 09:09

nrubin29