I am creating histograms of data organized in a dataframe and grouped by days. It might happen that in some days the data is identically null. Therefore, when I plot the histogram using the normed = True
property, I would expect one single bin centered in zero and with height equal to 1. However, I see that the height is equal to the number of bins. How can I fix this? I want to represent a probability density function with the histogram, so the maximum value should be 1.
Code sample and output:
plt.rcParams['figure.figsize'] = 10, 4
data = np.zeros((1000))
l = plt.hist(data,normed = True, bins = 100)
EDIT: I saw now that the property normed
is deprecated. However, if I try to use the attribute density
, I get the error AttributeError: Unknown property density
The plot you see is correct because the area under the curve (histogram/bar) should be 1. This is indeed the case in your plot. To highlight this, I create a vertical line at x=0.01
and you will notice that the width of the bar is indeed 0.01. Since the height of the bar is 100, the area is 100 * 0.01 = 1.
plt.rcParams['figure.figsize'] = 10, 4
data = np.zeros((1000))
l = plt.hist(data,normed = True, bins = 100)
plt.axvline(0.01, lw=1)
plt.ylim(0, 150)
The same happens if you use density=True
as
l = plt.hist(data,density = True, bins = 100)
Using the suggestion of jdehesa, following works your way
l = plt.hist(data,density = True, bins=np.arange(-10, 11))
Using the suggestion of DavidG based on this answer gives you a height of 1 but the area is not normalized to 1.
weights = np.ones_like(data)/float(len(data))
l = plt.hist(data,weights=weights)
Finally, if you need a height of 1 and a width of 1 (hence area = 1) and also the normalized area, you can use a single bin as
l = plt.hist(data, density=True, bins=1)
plt.xlim(-10, 10)
As other have explained, normed=True
(or density=True
in recent versions of Matplotlib) makes the area under the histogram equal to 1. You can get a histogram that represents the fraction of the sample falling on each bin like this:
import matplotlib.pyplot as plt
import numpy as np
data = np.zeros((1000))
# Compute histogram
hist, bins = np.histogram(data, density=True, bins=100)
# Width of each bin
bins_w = np.diff(bins)
# Compute proportion of sample in each bin
hist_p = hist * bins_w
# Plot histogram
plt.bar(bins[:-1], hist_p, width=bins_w, align='edge')
Result:
You could also make a histogram where each bin has a width of 1, but that is a more limited solution.
EDIT: As pointed out in other answers, this is basically equivalent to giving the proper weights
parameter to plt.hist
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With