Matplotlib: How to make a histogram with bins of equal area?

Tags:

Given some list of numbers following some arbitrary distribution, how can I define bin positions for matplotlib.pyplot.hist() so that the area in each bin is equal to (or close to) some constant area, A? The area should be calculated by multiplying the number of items in the bin by the width of the bin and its value should be no greater than A.

Here is a MWE to display a histogram with normally distributed sample data:

import matplotlib.pyplot as plt
import numpy as np

x = np.random.randn(100)
plt.hist(x, bin_pos)
plt.show()

Here bin_pos is a list representing the positions of the boundaries of the bins (see related question here.

451

asked Jun 06 '16 03:06

wrkyle

1 Answers

I found this question intriguing. The solution depends on whether you want to plot a density function, or a true histogram. The latter case turns out to be quite a bit more challenging. Here is more info on the difference between a histogram and a density function.

Density Functions

This will do what you want for a density function:

def histedges_equalN(x, nbin):
    npt = len(x)
    return np.interp(np.linspace(0, npt, nbin + 1),
                     np.arange(npt),
                     np.sort(x))

x = np.random.randn(1000)
n, bins, patches = plt.hist(x, histedges_equalN(x, 10), normed=True)

Note the use of normed=True, which specifies that we're calculating and plotting a density function. In this case the areas are identically equal (you can check by looking at n * np.diff(bins)). Also note that this solution involves finding bins that have the same number of points.

equal area density function

Histograms

Here is a solution that gives approximately equal area boxes for a histogram:

def histedges_equalA(x, nbin):
    pow = 0.5
    dx = np.diff(np.sort(x))
    tmp = np.cumsum(dx ** pow)
    tmp = np.pad(tmp, (1, 0), 'constant')
    return np.interp(np.linspace(0, tmp.max(), nbin + 1),
                     tmp,
                     np.sort(x))

n, bins, patches = plt.hist(x, histedges_equalA(x, nbin), normed=False)

These boxes, however, are not all equal area. The first and last, in particular, tend to be about 30% larger than the others. This is an artifact of the sparse distribution of the data at the tails of the normal distribution and I believe it will persist anytime their is a sparsely populated region in a data set.

Side note: I played with the value pow a bit, and found that a value of about 0.56 had a lower RMS error for the normal distribution. I stuck with the square-root because it performs best when the data is tightly-spaced (relative to the bin-width), and I'm pretty sure there is a theoretical basis for it that I haven't bothered to dig into (anyone?).

nearly equal area histogram

The issue with equal-area histograms

As far as I can tell it is not possible to obtain an exact solution to this problem. This is because it is sensitive to the discretization of the data. For example, suppose the first point in your dataset is an outlier at -13 and the next value is at -3, as depicted by the red dots in this image:

Diagram demonstrating histogram bin areas

Now suppose the total "area" of your histogram is 150 and you want 10 bins. In that case the area of each histogram bar should be about 15, but you can't get there because as soon as your bar includes the second point, its area jumps from 10 to 20. That is, the data does not allow this bar to have an area between 10 and 20. One solution for this might be to adjust the lower-bound of the box to increase its area, but this starts to become arbitrary and does not work if this 'gap' is in the middle of the data set.

answered Sep 27 '22 19:09

farenorth

Related questions
                            
                                Combine two JSON dictionaries in Python?
                            
                                numpy ValueError shapes not aligned
                            
                                How do I check if the list contains empty elements?
                            
                                MatPlotLib's ion() and draw() not working
                            
                                How to make django crispy form to hide a particular field?
                            
                                How to get status code when using after_request?
                            
                                Compare each row with all rows in data frame and save results in list for each row
                            
                                How to get a name of last migration programmatically?
                            
                                python XlsxWriter text wrapping and links styling
                            
                                archiving symlinks with python zipfile
                            
                                How to bin a 2D array in numpy?
                            
                                Streaming DroidCam video to OpenCV Python in ANYWAY possible
                            
                                Accessing total_seconds() in pandas data column
                            
                                Python Xpath: lxml.etree.XPathEvalError: Invalid predicate
                            
                                Why is Jupyter Notebook creating duplicate plots when making updating plots
                            
                                Is it possible to know if two python functions are functionally equivalent?
                            
                                How to override method of the logging module
                            
                                How to get VirtualEnv TensorFlow to work in PyCharm?
                            
                                Pairing bluetooth devices with Passkey/Password in python - RFCOMM (Linux)
                            
                                what does C-contiguous fashion mean in caffe blob storage?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Matplotlib: How to make a histogram with bins of equal area?

Tags:

python

matplotlib

histogram

binning