I have a Pandas dataframe with one column: Crime type. The column contains 16 different "categories" of crime, which I would like to visualise as a word cloud, with words sized based on their frequency within the dataframe.
I have attempted to do this with the following code:
To bring the data in:
fields = ['Crime type']
text2 = pd.read_csv('allCrime.csv', usecols=fields)
To generate the word cloud:
wordcloud2 = WordCloud().generate(text2)
# Generate plot
plt.imshow(wordcloud2)
plt.axis("off")
plt.show()
However, I get this error:
TypeError: expected string or bytes-like object
I was able to create an earlier word cloud from the full dataset, using the following code, but I want the word cloud to only generate words from the specific column, 'crime type' ('allCrime.csv' contains approx. 13 columns):
text = open('allCrime.csv').read()
wordcloud = WordCloud().generate(text)
# Generate plot
plt.imshow(wordcloud)
plt.axis("off")
plt.show()
I'm new to Python and Pandas (and coding generally!) so all help is gratefully received.
The problem is that the WordCloud.generate
method that you are using expects a string on which it will count the word instances but you provide a pd.Series
.
Depending on what you want the word cloud to generate on you can either do:
wordcloud2 = WordCloud().generate(' '.join(text2['Crime Type']))
, which would concatenate all words in your dataframe column and then count all instances.
Use WordCloud.generate_from_frequencies
to manually pass the computed frequencies of words.
df = pd.read_csv('allCrime.csv', usecols=fields)
text = df['Crime type'].values
wordcloud = WordCloud().generate(str(text))
plt.imshow(wordcloud)
plt.axis("off")
plt.show()
You need to create a concatenated input text. This can be done with the join
function.
fields = ['Crime type']
text2 = pd.read_csv('allCrime.csv', usecols=fields)
text3 = ' '.join(text2['Crime Type'])
wordcloud2 = WordCloud().generate(text3)
# Generate plot
plt.imshow(wordcloud2)
plt.axis("off")
plt.show()
You can generate a word cloud while removing all the stop words for a single column. Let's say your data frame is df and col name is comment then the following code can help:
#Final word cloud after all the cleaning and pre-processing
import matplotlib.pyplot as plt
from wordcloud import WordCloud, STOPWORDS
comment_words = ' '
stopwords = set(STOPWORDS)
# iterate through the csv file
for val in df.comment:
# typecaste each val to string
val = str(val)
# split the value
tokens = val.split()
# Converts each token into lowercase
for i in range(len(tokens)):
tokens[i] = tokens[i].lower()
for words in tokens:
comment_words = comment_words + words + ' '
wordcloud = WordCloud(width = 800, height = 800,
background_color ='white',
stopwords = stopwords,
min_font_size = 10).generate(comment_words)
# plot the WordCloud image
plt.figure(figsize = (8, 8), facecolor = None)
plt.imshow(wordcloud)
plt.axis("off")
plt.tight_layout(pad = 0)
plt.show()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With