Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Wordcloud Python with generate_from_frequencies

I'm trying to create a wordcloud from csv file. The csv file, as an example, has the following structure:

a,1
b,2
c,4
j,20

It has more rows, more or less 1800. The first column has string values (names) and the second column has their respective frequency (int). Then, the file is read and the key,value row is stored in a dictionary (d) because later on we will use this to plot the wordcloud:

reader = csv.reader(open('namesDFtoCSV', 'r',newline='\n'))
d = {}
for k,v in reader:
    d[k] = v

Once we have the dictionary full of values, I try to plot the wordcloud:

#Generating wordcloud. Relative scaling value is to adjust the importance of a frequency word.
#See documentation: https://github.com/amueller/word_cloud/blob/master/wordcloud/wordcloud.py
    wordcloud = WordCloud(width=900,height=500, max_words=1628,relative_scaling=1,normalize_plurals=False).generate_from_frequencies(d)
    plt.imshow(wordcloud, interpolation='bilinear')
    plt.axis("off")
    plt.show()
But an error is thrown:

    Traceback (most recent call last):
    File ".........../script.py", line 19, in <module>
    wordcloud = WordCloud(width=900,height=500, max_words=1628,relative_scaling=1,normalize_plurals=False).generate_from_frequencies(d)
    File "/usr/local/lib/python3.5/dist-packages/wordcloud/wordcloud.py", line  360, in generate_from_frequencies
    for word, freq in frequencies]
    File "/usr/local/lib/python3.5/dist-packages/wordcloud/wordcloud.py", line 360, in <listcomp>
    for word, freq in frequencies]
    TypeError: unsupported operand type(s) for /: 'str' and 'float

Finally, the documentation says:

def generate_from_frequencies(self, frequencies, max_font_size=None):
    """Create a word_cloud from words and frequencies.
    Parameters
    ----------
    frequencies : dict from string to float
        A contains words and associated frequency.
    max_font_size : int
        Use this font-size instead of self.max_font_size
    Returns
    -------
    self
```python

So, I don't understand why is trowing me this error if I met the requirements of the function. I hope someone can help me, thanks.

**Note**

I work with worldcloud 1.3.1

like image 397
cmc_carlos Avatar asked Mar 27 '17 10:03

cmc_carlos


2 Answers

This is because the values in your dictionary are strings but wordcloud expects integer or floats.

After I run your code then inspect your dictionary d I get the following.

In [12]: d

Out[12]: {'a': '1', 'b': '2', 'c': '4', 'j': '20'}

Note the ' ' around the numbers means these are really strings.

A hacky way to resolve this is to cast v to an int in your FOR loop like:

d[k] = int(v)

I say this is hacky since it'll work on integers but if you have floats in your input then it may cause problems.

Also, Python errors can be difficult to read. Your error above can be interpreted as

script.py", line 19

TypeError: unsupported operand type(s) for /: 'str' and 'float

"There's a type error on or before line 19 of my file. Let me look at my data types to see if there is any mismatch between string and float..."

The code below works for me:

import csv
from wordcloud import WordCloud
import matplotlib.pyplot as plt

reader = csv.reader(open('namesDFtoCSV', 'r',newline='\n'))
d = {}
for k,v in reader:
    d[k] = int(v)

#Generating wordcloud. Relative scaling value is to adjust the importance of a frequency word.
#See documentation: https://github.com/amueller/word_cloud/blob/master/wordcloud/wordcloud.py
wordcloud = WordCloud(width=900,height=500, max_words=1628,relative_scaling=1,normalize_plurals=False).generate_from_frequencies(d)

plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
like image 102
RandomTask Avatar answered Oct 07 '22 03:10

RandomTask


# LEARNER CODE START HERE
file_c=""
for index, char in enumerate(file_contents):
    if(char.isalpha()==True or char.isspace()):
        file_c+=char
file_c=file_c.split()
file_w=[]
for word in file_c:
    if word.lower() not in uninteresting_words and word.isalpha()==True:
    file_w.append(word)
frequency={}
for word in file_w:
    if word.lower() not in frequency:
        frequency[word.lower()]=1
    else:
        frequency[word.lower()]+=1
#wordcloud
cloud = wordcloud.WordCloud()
cloud.generate_from_frequencies(frequency)
return cloud.to_array()
like image 38
Wasim Bhat Avatar answered Oct 07 '22 01:10

Wasim Bhat