Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Histogram with area normalized to something other than 1

Is there a way to tell matplotlib to "normalize" a histogram such that its area equals a specified value (other than 1)?

The option "normed = 0" in

n, bins, patches = plt.hist(x, 50, normed=0, histtype='stepfilled')

just brings it back to a frequency distribution.

like image 561
Pawin Avatar asked Jan 27 '12 16:01

Pawin


People also ask

How do you normalize a histogram in Python?

To normalize a histogram in Python, we can use hist() method. In normalized bar, the area underneath the plot should be 1.

How do I normalize a histogram in Matplotlib?

We can normalize a histogram in Matplotlib using the density keyword argument and setting it to True . By normalizing a histogram, the sum of the bar area equals 1.

How do you normalize data in Python?

Using MinMaxScaler() to Normalize Data in Python This is a more popular choice for normalizing datasets. You can see that the values in the output are between (0 and 1). MinMaxScaler also gives you the option to select feature range. By default, the range is set to (0,1).


2 Answers

Just calculate it and normalize it to any value you'd like, then use bar to plot the histogram.

On a side note, this will normalize things such that the area of all the bars is normed_value. The raw sum will not be normed_value (though it's easy to have that be the case, if you'd like).

E.g.

import numpy as np
import matplotlib.pyplot as plt

x = np.random.random(100)
normed_value = 2

hist, bins = np.histogram(x, bins=20, density=True)
widths = np.diff(bins)
hist *= normed_value

plt.bar(bins[:-1], hist, widths)
plt.show()

enter image description here

So, in this case, if we were to integrate (sum the height multiplied by the width) the bins, we'd get 2.0 instead of 1.0. (i.e. (hist * widths).sum() will yield 2.0)

like image 152
Joe Kington Avatar answered Oct 26 '22 01:10

Joe Kington


You can pass a weights argument to hist instead of using normed. For example, if your bins cover the interval [minval, maxval], you have n bins, and you want to normalize the area to A, then I think

weights = np.empty_like(x)
weights.fill(A * n / (maxval-minval) / x.size)
plt.hist(x, bins=n, range=(minval, maxval), weights=weights)

should do the trick.

EDIT: The weights argument must be the same size as x, and its effect is to make each value in x contribute the corresponding value in weights towards the bin count, instead of 1.

I think the hist function could probably do with a greater ability to control normalization, though. For example, I think as it stands, values outside the binned range are ignored when normalizing, which isn't generally what you want.

like image 37
James Avatar answered Oct 26 '22 00:10

James