Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

word cloud does not show the frequency of the words correctly

I have plotted my text data in the word cloud. this is the data frame I have

vocab   sumCI
aid      3
tinnitu  3
sudden   3
squamou  3
saphen   3
problem  3
prednison 3
pain    2
dysuria 3
cancer  2

then I transformed it as a string like this. (Actually, I have copied the number of the times each word happened in my data frame and then fed this to the function):

aid aid aid tinnitu tinnitu tinnitu sudden sudden sudden squamou squamou squamou

then I have used this code to visualize text data:

def generate_wordcloud(text): # optionally add: stopwords=STOPWORDS and change the arg below
    wordcloud = WordCloud(
                          background_color="white",
                          width=1200, height=1000,
                          relative_scaling = 1.0,
                          collocations=False
                          ).generate(text)
    plt.imshow(wordcloud)
    plt.axis("off")
    plt.show()

cidf=cidf.loc[cidf.index.repeat(cidf['sumCI'])].reset_index(drop=True)
strCI = ' '.join(cidf['vocab'])
print(strCI)
generate_wordcloud(strCI)

and then the result is like this:

as you see most words are repeated 2 or 3 times but their size in the word cloud does not show this. even for the words of the same size, there is a big difference in sizes! word_cloud_pic

for example:

for example look at "tinnitu" and "dysuria" in this data frame which both has the frequency of 3, tinnitu quite big but dysuria you'll find it very hard as it is very small.

Thanks :)

like image 257
sariii Avatar asked Feb 07 '19 04:02

sariii


People also ask

What's wrong with word clouds?

They lack context Some words are meaningful on their own, such as friendly and helpful. Others, require context to understand what the customers are actually saying. This is only possible with access to a more specific phrase, or even the entire comment.

Which is true about word cloud?

Word clouds (also known as text clouds or tag clouds) work in a simple way: the more a specific word appears in a source of textual data (such as a speech, blog post, or database), the bigger and bolder it appears in the word cloud. A word cloud is a collection, or cluster, of words depicted in different sizes.

What happened to wordle word cloud generator?

Wordle.net was a popular wordle maker that required you to install a desktop version. As of December 2020, it appears that Wordle.Net no longer exists. However, there are plenty of other online word cloud tools that you can use to create your wordle.


1 Answers

Well, I figured it out by searching alooot. I ended up using generate_from_frequencies(text) rather than using only generate. But still in case that the frequency is the same, it does not give the same size to all of them.

If you look at the documents they also mentioned about the ranking or order(this is the thing that I really can not understand, they better make it as an option. for example, if the algorithm sees the same frequency what should be the approach 1. based on order 2. doing nothing and giving the same size).

Base on my research and output, when it sees the same frequency and also based on the space it has, it may change the size which is not good.

My sayings are only based on my experiment and reading the documents.

like image 147
sariii Avatar answered Oct 22 '22 07:10

sariii