Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to generate a word frequency histogram, where bars are ordered according to their height

I have a long list of words, and I want to generate a histogram of the frequency of each word in my list. I was able to do that in the code below:

import csv
from collections import Counter
import numpy as np

word_list = ['A','A','B','B','A','C','C','C','C']

counts = Counter(merged)

labels, values = zip(*counts.items())

indexes = np.arange(len(labels))

plt.bar(indexes, values)
plt.show()

It doesn't, however, display the bins by rank (i.e. by frequency, so highest frequency is first bin on the left and so on), even though when I print counts it orders them for me Counter({'C': 4, 'A': 3, 'B': 2}). How could I achieve that?

like image 799
BKS Avatar asked Feb 24 '16 07:02

BKS


People also ask

What is determined from the height of the bar in a relative frequency histogram?

The vertical axis of a histogram represents the count or frequency that a data value occurs in each of the bins. The higher the bar is, the more data values fall into this range of bin values.

What does a the height of a column in a frequency histogram represent?

A histogram is a chart that plots the distribution of a numeric variable's values as a series of bars. Each bar typically covers a range of numeric values called a bin or class; a bar's height indicates the frequency of data points with a value within the corresponding bin.

What is a frequency histogram?

A frequency histogram is a graphical version of a frequency distribution where the width and position of rectangles are used to indicate the various classes, with the heights of those rectangles indicating the frequency with which data fell into the associated class, as the example below suggests.


1 Answers

You can achieve the desired output by sorting your data first and then pass the ordered arrays to bar; below I use numpy.argsort for that. The plot then looks as follows (I also added the labels to the bar):

enter image description here

Here is the code that produces the plot with a few inline comments:

from collections import Counter
import numpy as np
import matplotlib.pyplot as plt

word_list = ['A', 'A', 'B', 'B', 'A', 'C', 'C', 'C', 'C']

counts = Counter(word_list)

labels, values = zip(*counts.items())

# sort your values in descending order
indSort = np.argsort(values)[::-1]

# rearrange your data
labels = np.array(labels)[indSort]
values = np.array(values)[indSort]

indexes = np.arange(len(labels))

bar_width = 0.35

plt.bar(indexes, values)

# add labels
plt.xticks(indexes + bar_width, labels)
plt.show()

In case you want to plot only the first n entries, you can replace the line

counts = Counter(word_list)

by

counts = dict(Counter(word_list).most_common(n))

In the case above, counts would then be

{'A': 3, 'C': 4}

for n = 2.

If you like to remove the frame of the plot and label the bars directly, you can check this post.

like image 173
Cleb Avatar answered Oct 03 '22 03:10

Cleb