I would like to make a word frequency distribution, with the words on the x-axis and the frequency count on the y-axis.
I have the following list:
example_list = [('dhr', 17838), ('mw', 13675), ('wel', 5499), ('goed', 5080),
('contact', 4506), ('medicatie', 3797), ('uur', 3792),
('gaan', 3473), ('kwam', 3463), ('kamer', 3447),
('mee', 3278), ('gesprek', 2978)]
I tried to first convert it into a pandas DataFrame and then use the pd.hist()
as in the example below, but I just can't figure it out and think it is actually straight forward but probably I'm missing something.
import numpy as np
import matplotlib.pyplot as plt
word = []
frequency = []
for i in range(len(example_list)):
word.append(example_list[i][0])
frequency.append(example_list[i][1])
plt.bar(word, frequency, color='r')
plt.show()
How to count item frequency in a tuple? You can use the Python tuple count() function to count the frequency of an element in a tuple. Pass the element for which you want to count the occurrences as an argument to the function.
Create Histogram In Matplotlib, we use the hist() function to create histograms. The hist() function will use an array of numbers to create a histogram, the array is sent into the function as an argument.
Using pandas:
import pandas as pd
import matplotlib.pyplot as plt
example_list = [('dhr', 17838), ('mw', 13675), ('wel', 5499), ('goed', 5080), ('contact', 4506), ('medicatie', 3797), ('uur', 3792), ('gaan', 3473), ('kwam', 3463), ('kamer', 3447), ('mee', 3278), ('gesprek', 2978)]
df = pd.DataFrame(example_list, columns=['word', 'frequency'])
df.plot(kind='bar', x='word')
You can't pass the word
s into the matplotlib.pyplot.bar
directly. However you could create an indices array for bar
and then replace these indices with the words
using matplotlib.pyplot.xticks
:
import numpy as np
import matplotlib.pyplot as plt
indices = np.arange(len(example_list))
plt.bar(indices, frequency, color='r')
plt.xticks(indices, word, rotation='vertical')
plt.tight_layout()
plt.show()
The for
-loop to create word
and frequency
could also be replaced by a simple zip
und list unpacking:
word, frequency = zip(*example_list)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With