How to generate a word frequency histogram, where bars are ordered according to their height

Tags:

I have a long list of words, and I want to generate a histogram of the frequency of each word in my list. I was able to do that in the code below:

import csv
from collections import Counter
import numpy as np

word_list = ['A','A','B','B','A','C','C','C','C']

counts = Counter(merged)

labels, values = zip(*counts.items())

indexes = np.arange(len(labels))

plt.bar(indexes, values)
plt.show()

It doesn't, however, display the bins by rank (i.e. by frequency, so highest frequency is first bin on the left and so on), even though when I print counts it orders them for me Counter({'C': 4, 'A': 3, 'B': 2}). How could I achieve that?

799

asked Feb 24 '16 07:02

BKS

1 Answers

You can achieve the desired output by sorting your data first and then pass the ordered arrays to bar; below I use numpy.argsort for that. The plot then looks as follows (I also added the labels to the bar):

enter image description here

Here is the code that produces the plot with a few inline comments:

from collections import Counter
import numpy as np
import matplotlib.pyplot as plt

word_list = ['A', 'A', 'B', 'B', 'A', 'C', 'C', 'C', 'C']

counts = Counter(word_list)

labels, values = zip(*counts.items())

# sort your values in descending order
indSort = np.argsort(values)[::-1]

# rearrange your data
labels = np.array(labels)[indSort]
values = np.array(values)[indSort]

indexes = np.arange(len(labels))

bar_width = 0.35

plt.bar(indexes, values)

# add labels
plt.xticks(indexes + bar_width, labels)
plt.show()

In case you want to plot only the first n entries, you can replace the line

counts = Counter(word_list)

counts = dict(Counter(word_list).most_common(n))

In the case above, counts would then be

{'A': 3, 'C': 4}

for n = 2.

If you like to remove the frame of the plot and label the bars directly, you can check this post.

173

answered Oct 03 '22 03:10

Cleb

Related questions
                            
                                how can i query data filtered by a JSON Column in SQLAlchemy?
                            
                                Extracting image from video at a given time using OpenCV
                            
                                give parameter(list or array) to in operator - python, sql [duplicate]
                            
                                Scapy packet sniffer triggering an action up on each sniffed packet
                            
                                Isolating py.test DB sessions in Flask-SQLAlchemy
                            
                                Apply function on cumulative values of pandas series
                            
                                Changing constraint naming conventions in Flask-SQLAlchemy
                            
                                Sqlalchemy, raw query and parameters
                            
                                Deploying Django to AWS - WSGIPath refers to a file that does not exist
                            
                                Django Rest Framework nested serializer not showing related data
                            
                                WeasyPrint page size wrong. (8.27in x 11.69 in)
                            
                                Unrecognized commands in bash are captured by the python interpreter [closed]
                            
                                How do you add a simple counter column that increases by one in each row to a Pandas DataFrame?
                            
                                Too many if statements
                            
                                How to label y-axis when using a secondary y-axis?
                            
                                Confusing behaviour of Pandas crosstab() function with dataframe containing NaN values
                            
                                Do all callables have __name__?
                            
                                When would you use reduce() instead of sum()?
                            
                                Django 1.9 Installation SyntaxError: invalid syntax [duplicate]
                            
                                Insert list of lists into single column of pandas df

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to generate a word frequency histogram, where bars are ordered according to their height

Tags:

python

matplotlib

python-2.7

ranking

histogram

BKS

People also ask

1 Answers

Cleb

Recent Activity

Donate For Us