Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matplotlib histogram with collection bin for high values

I have an array with values, and I want to create a histogram of it. I am mainly interested in the low end numbers, and want to collect every number above 300 in one bin. This bin should have the same width as all other (equally wide) bins. How can I do this?

Note: this question is related to this question: Defining bin width/x-axis scale in Matplotlib histogram

This is what I tried so far:

import matplotlib.pyplot as plt import numpy as np  def plot_histogram_01():     np.random.seed(1)     values_A = np.random.choice(np.arange(600), size=200, replace=True).tolist()     values_B = np.random.choice(np.arange(600), size=200, replace=True).tolist()      bins = [0, 25, 50, 75, 100, 125, 150, 175, 200, 225, 250, 275, 300, 600]      fig, ax = plt.subplots(figsize=(9, 5))     _, bins, patches = plt.hist([values_A, values_B], normed=1,  # normed is deprecated and will be replaced by density                                 bins=bins,                                 color=['#3782CC', '#AFD5FA'],                                 label=['A', 'B'])      xlabels = np.array(bins[1:], dtype='|S4')     xlabels[-1] = '300+'      N_labels = len(xlabels)     plt.xlim([0, 600])     plt.xticks(25 * np.arange(N_labels) + 12.5)     ax.set_xticklabels(xlabels)      plt.yticks([])     plt.title('')     plt.setp(patches, linewidth=0)     plt.legend()      fig.tight_layout()     plt.savefig('my_plot_01.png')     plt.close() 

This is the result, which does not look nice: enter image description here

I then changed the line with xlim in it:

plt.xlim([0, 325]) 

With the following result: enter image description here

It looks more or less as I want it, but the last bin is not visible now. Which trick am I missing to visualize this last bin with a width of 25?

like image 612
physicalattraction Avatar asked Oct 06 '14 14:10

physicalattraction


1 Answers

Numpy has a handy function for dealing with this: np.clip. Despite what the name may sound like, it doesn't remove values, it just limits them to the range you specify. Basically, it does Artem's "dirty hack" inline. You can leave the values as they are, but in the hist call, just wrap the array in an np.clip call, like so

plt.hist(np.clip(values_A, bins[0], bins[-1]), bins=bins) 

This is nicer for a number of reasons:

  1. It's way faster — at least for large numbers of elements. Numpy does its work at the C level. Operating on python lists (as in Artem's list comprehension) has a lot of overhead for each element. Basically, if you ever have the option to use numpy, you should.

  2. You do it right where it's needed, which reduces the chance of making mistakes in your code.

  3. You don't need to keep a second copy of the array hanging around, which reduces memory usage (except within this one line) and further reduces the chances of making mistakes.

  4. Using bins[0], bins[-1] instead of hard-coding the values reduces the chances of making mistakes again, because you can change the bins just where bins was defined; you don't need to remember to change them in the call to clip or anywhere else.

So to put it all together as in the OP:

import matplotlib.pyplot as plt import numpy as np  def plot_histogram_01():     np.random.seed(1)     values_A = np.random.choice(np.arange(600), size=200, replace=True)     values_B = np.random.choice(np.arange(600), size=200, replace=True)      bins = np.arange(0,350,25)      fig, ax = plt.subplots(figsize=(9, 5))     _, bins, patches = plt.hist([np.clip(values_A, bins[0], bins[-1]),                                  np.clip(values_B, bins[0], bins[-1])],                                 # normed=1,  # normed is deprecated; replace with density                                 density=True,                                 bins=bins, color=['#3782CC', '#AFD5FA'], label=['A', 'B'])      xlabels = bins[1:].astype(str)     xlabels[-1] += '+'      N_labels = len(xlabels)     plt.xlim([0, 325])     plt.xticks(25 * np.arange(N_labels) + 12.5)     ax.set_xticklabels(xlabels)      plt.yticks([])     plt.title('')     plt.setp(patches, linewidth=0)     plt.legend(loc='upper left')      fig.tight_layout() plot_histogram_01() 

result of code above

like image 111
Mike Avatar answered Sep 17 '22 03:09

Mike