Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Histogram with equal number of points in each bin

I have a sorted vector points with 100 points. I now want to create two histograms: the first histogram should have 10 bins having equal width. The second should also have 10 histograms, but not necessarily of equal width. In the second, I just want the histogram to have the same number of points in each bin. So for example, the first bar might be very short and wide, while the second bar in the histogram might be very tall and narrow. I have code that creates the first histogram using matplotlib, but now I'm not sure how to go about creating the second one.

import matplotlib.pyplot as plt
points = [1,2,3,4,5,6, ..., 99]
n, bins, patches = plt.hist(points, 10)

Edit:

Trying the solution below, I'm a bit puzzled as to why the heights of all of the bars in my histogram are the same.

enter image description here

like image 583
Apollo Avatar asked Sep 09 '16 19:09

Apollo


People also ask

Do histogram bins have to be equal?

However, bins need not be of equal width; in that case, the erected rectangle is defined to have its area proportional to the frequency of cases in the bin. The vertical axis is then not the frequency but frequency density—the number of cases per unit of the variable on the horizontal axis.

How does number of bins affect histogram?

If you have too many bins, then the data distribution will look rough, and it will be difficult to discern the signal from the noise. On the other hand, with too few bins, the histogram will lack the details needed to discern any useful pattern from the data.

What is number of bins histogram?

The bin-width is set to h=2×IQR×n−1/3. So the number of bins is (max−min)/h, where n is the number of observations, max is the maximum value and min is the minimum value.


1 Answers

This question is similar to one that I wrote an answer to a while back, but sufficiently different to warrant it's own question. The solution, it turns out, uses basically the same code from my other answer.

def histedges_equalN(x, nbin):
    npt = len(x)
    return np.interp(np.linspace(0, npt, nbin + 1),
                     np.arange(npt),
                     np.sort(x))

x = np.random.randn(100)
n, bins, patches = plt.hist(x, histedges_equalN(x, 10))

This solution gives a histogram with equal height bins, because---by definition---a histogram is a count of the number of points in each bin.

To get a pdf (i.e. density function) use the normed=True kwarg to plt.hist. As described in my other answer.

like image 79
farenorth Avatar answered Oct 21 '22 03:10

farenorth