matplotlib.pyplot.hist wrong normed property [duplicate]

Question

I am creating histograms of data organized in a dataframe and grouped by days. It might happen that in some days the data is identically null. Therefore, when I plot the histogram using the normed = True property, I would expect one single bin centered in zero and with height equal to 1. However, I see that the height is equal to the number of bins. How can I fix this? I want to represent a probability density function with the histogram, so the maximum value should be 1.

Code sample and output:

plt.rcParams['figure.figsize'] = 10, 4
data = np.zeros((1000))
l = plt.hist(data,normed = True, bins = 100)

enter image description here

EDIT: I saw now that the property normed is deprecated. However, if I try to use the attribute density, I get the error AttributeError: Unknown property density

Sheldore · Accepted Answer

The plot you see is correct because the area under the curve (histogram/bar) should be 1. This is indeed the case in your plot. To highlight this, I create a vertical line at x=0.01 and you will notice that the width of the bar is indeed 0.01. Since the height of the bar is 100, the area is 100 * 0.01 = 1.

plt.rcParams['figure.figsize'] = 10, 4
data = np.zeros((1000))
l = plt.hist(data,normed = True, bins = 100)
plt.axvline(0.01, lw=1)
plt.ylim(0, 150)

The same happens if you use density=True as

l = plt.hist(data,density = True, bins = 100)

enter image description here

Using the suggestion of jdehesa, following works your way

l = plt.hist(data,density = True, bins=np.arange(-10, 11))

enter image description here

Using the suggestion of DavidG based on this answer gives you a height of 1 but the area is not normalized to 1.

weights = np.ones_like(data)/float(len(data))
l = plt.hist(data,weights=weights)

enter image description here

Finally, if you need a height of 1 and a width of 1 (hence area = 1) and also the normalized area, you can use a single bin as

l = plt.hist(data, density=True, bins=1)
plt.xlim(-10, 10)

enter image description here

jdehesa · Answer

As other have explained, normed=True (or density=True in recent versions of Matplotlib) makes the area under the histogram equal to 1. You can get a histogram that represents the fraction of the sample falling on each bin like this:

import matplotlib.pyplot as plt
import numpy as np

data = np.zeros((1000))
# Compute histogram
hist, bins = np.histogram(data, density=True, bins=100)
# Width of each bin
bins_w = np.diff(bins)
# Compute proportion of sample in each bin
hist_p = hist * bins_w
# Plot histogram
plt.bar(bins[:-1], hist_p, width=bins_w, align='edge')

Result:

$Histogram of fractions$

You could also make a histogram where each bin has a width of 1, but that is a more limited solution.

EDIT: As pointed out in other answers, this is basically equivalent to giving the proper weights parameter to plt.hist.

matplotlib.pyplot.hist wrong normed property [duplicate]

Tags:

python

matplotlib

plot

histogram

cholo14

2 Answers

Sheldore

jdehesa

Recent Activity

Donate For Us

matplotlib.pyplot.hist wrong normed property [duplicate]

Tags:

python

matplotlib

plot

histogram

cholo14

2 Answers

Sheldore

jdehesa

Related questions

Recent Activity

Donate For Us