Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: wordcloud, repetitve words

In the word cloud I have repetitive words and I do not understand why they are not counted together and are shown then as one word.

from wordcloud import WordCloud
word_string = 'oh oh oh oh oh oh verse wrote book stand title book would life superman thats make feel count privilege love ideal honored know feel see everyday things things say rock baby truth rock love rock rock everything need rock baby rock wanna kiss ya feel ya please ya right wanna touch ya love ya baby night reward ya things rock love rock love rock oh oh oh verse try count ways make smile id run fingers run timeless things talk sugar keeps going make wanna keep lovin strong make wanna try best give want need give whole heart little piece minimum talking everything single wish talking every dream rock baby truth rock love rock rock everything need rock baby rock wanna kiss ya feel ya please ya right wanna touch ya love ya baby night reward ya things rock love rock wanna rock bridge theres options dont want theyre worth time cause oh thank like us fine rock sand smile cry joy pain truth lies matter know count oh oh oh oh oh oh rock baby truth rock love rock rock everything need rock baby rock wanna kiss ya feel ya please ya right wanna touch ya love ya baby night reward ya things rock love rock love rock oh oh oh oh oh oh wanna kiss ya feel ya please ya right wanna touch ya love ya baby night reward ya things rock love rock wanna rock party people people party popping sitting around see looking looking see look started lets hook little one one come give stuff let freshin ruff lets go lets hook start wont stop baby baby dont stop come give stuff lets go black culture black culture black culture black culture party people people party popping sitting around see looking looking see look started lets hook come one give stuff let freshin little one one ruff lets go lets hook start wont stop baby baby dont stop come give stuff lets go black culture black culture black culture black culture lets hook come give stuff let freshin little one one ruff lets go lets hook start wont stop baby baby dont stop come give stuff lets go lets hook come give stuff let freshin little one one ruff lets go lets hook start wont stop baby baby dont stop come give stuff lets go black culture black culture black culture black culture black culture black culture black culture black culture'
wordcloud = WordCloud(background_color="white",
                          width=1200, height=1000,
                          stopwords=STOPWORDS
                         ).generate(word_string)
plt.imshow(wordcloud)

As you see words like love, oh, rock, black, culture appear several times and it seems that they are not counted together. What am I doing wrong?

enter image description here

like image 785
Alina Avatar asked May 13 '17 14:05

Alina


People also ask

How do you add stop words to Wordcloud?

The Stop Words property allows you to turn on the Default Stop Words that are built into the visual, or add your own by simply typing each word separated by a space in the Words property.

What are Stopwords in Wordcloud?

From the wordcloud documentation: stopwords : set of strings or None. The words that will be eliminated. If None, the build-in STOPWORDS list will be used.

Can a word cloud have phrases?

Word clouds are a visually interesting way to highlight key words or phrases that appear in a selected text. The more frequent the words or phrases appear in the text, the larger they are displayed in the word cloud.


2 Answers

That is a feature called 'collocations' in the word_cloud project. You can turn it off by setting collocations=False, like this:

    wordcloud = WordCloud(collocations=False).generate(word_string)

This will get rid of words that are frequently grouped together in your text. It will get rid of some things you probably don't like, for instance, "oh oh" and it will get rid of some others that you may like, for instance, "black culture"

like image 136
craigching Avatar answered Sep 20 '22 05:09

craigching


If you look at wordcloud.words_ you will see the frequency table includes some two-word phrases like 'oh oh', 'hook start', 'lets go', 'lets hook'.

You would need to dig into the code behind .process_text() to see exactly why it does this.

As a work-around you could split word_string yourself to build a word-frequency table, then use .generate_from_frequencies() to create the image.

like image 26
Hugh Bothwell Avatar answered Sep 21 '22 05:09

Hugh Bothwell