I have a long list of words, and I want to generate a histogram of the frequency of each word in my list. I was able to do that in the code below:
import csv
from collections import Counter
import numpy as np
word_list = ['A','A','B','B','A','C','C','C','C']
counts = Counter(merged)
labels, values = zip(*counts.items())
indexes = np.arange(len(labels))
plt.bar(indexes, values)
plt.show()
It doesn't, however, display the bins by rank (i.e. by frequency, so highest frequency is first bin on the left and so on), even though when I print counts
it orders them for me Counter({'C': 4, 'A': 3, 'B': 2})
. How could I achieve that?
The vertical axis of a histogram represents the count or frequency that a data value occurs in each of the bins. The higher the bar is, the more data values fall into this range of bin values.
A histogram is a chart that plots the distribution of a numeric variable's values as a series of bars. Each bar typically covers a range of numeric values called a bin or class; a bar's height indicates the frequency of data points with a value within the corresponding bin.
A frequency histogram is a graphical version of a frequency distribution where the width and position of rectangles are used to indicate the various classes, with the heights of those rectangles indicating the frequency with which data fell into the associated class, as the example below suggests.
You can achieve the desired output by sorting your data first and then pass the ordered arrays to bar
; below I use numpy.argsort
for that. The plot then looks as follows (I also added the labels to the bar):
Here is the code that produces the plot with a few inline comments:
from collections import Counter
import numpy as np
import matplotlib.pyplot as plt
word_list = ['A', 'A', 'B', 'B', 'A', 'C', 'C', 'C', 'C']
counts = Counter(word_list)
labels, values = zip(*counts.items())
# sort your values in descending order
indSort = np.argsort(values)[::-1]
# rearrange your data
labels = np.array(labels)[indSort]
values = np.array(values)[indSort]
indexes = np.arange(len(labels))
bar_width = 0.35
plt.bar(indexes, values)
# add labels
plt.xticks(indexes + bar_width, labels)
plt.show()
In case you want to plot only the first n
entries, you can replace the line
counts = Counter(word_list)
by
counts = dict(Counter(word_list).most_common(n))
In the case above, counts
would then be
{'A': 3, 'C': 4}
for n = 2
.
If you like to remove the frame of the plot and label the bars directly, you can check this post.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With