Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

matplotlib normed histograms

I'm trying to draw part of a histogram using matplotlib.

Instead of drawing the whole histogram which has a lot of outliers and large values I want to focus on just a small part. The original histogram looks like this:

hist(data, bins=arange(data.min(), data.max(), 1000), normed=1, cumulative=False)
plt.ylabel("PDF")

enter image description here

And after focusing it looks like this:

hist(data, bins=arange(0, 121, 1), normed=1, cumulative=False)
plt.ylabel("PDF")

enter image description here

Notice that the last bin is stretched and worst of all the Y ticks are scaled so that the sum is exactly 1 (so points out of the current range are not taken into account at all)

I know that I can achieve what I want by drawing the histogram over the whole possible range and then restricting the axis to the part I'm interested in, but it wastes a lot of time calculating bins that I won't use/see anyway.

hist(btsd-40, bins=arange(btsd.min(), btsd.max(), 1), normed=1, cumulative=False)
axis([0,120,0,0.0025])

enter image description here

Is there a fast and easy way to draw just the focused region but still get the Y scale correct?

like image 363
cdecker Avatar asked Sep 05 '12 14:09

cdecker


2 Answers

In order to plot a subset of the histogram, I don't think you can get around to calculating the whole histogram.

Have you tried computing the histogram with numpy.histogram and then plotting a region using pylab.plot or something? I.e.

import numpy as np
import pylab as plt

data = np.random.normal(size=10000)*10000

plt.figure(0)
plt.hist(data, bins=np.arange(data.min(), data.max(), 1000))

plt.figure(1)
hist1 = np.histogram(data, bins=np.arange(data.min(), data.max(), 1000))
plt.bar(hist1[1][:-1], hist1[0], width=1000)

plt.figure(2)
hist2 = np.histogram(data, bins=np.arange(data.min(), data.max(), 200))
mask = (hist2[1][:-1] < 20000) * (hist2[1][:-1] > 0)
plt.bar(hist2[1][mask], hist2[0][mask], width=200)

Original histogram: Original histogram

Histogram calculated manually: Histogram calculated manually

Histogram calculated manually, cropped: Histogram calculated manually, cropped (N.B.: values are smaller because bins are narrower)

like image 163
Tim Avatar answered Oct 12 '22 15:10

Tim


I think, you can normalize your data using a given weight. (repeat is a numpy function).

hist(data, bins=arange(0, 121, 1), weights=repeat(1.0/len(data), len(data)))

like image 30
Sunhwan Jo Avatar answered Oct 12 '22 13:10

Sunhwan Jo