Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

matplotlib.pyplot.hist wrong normed property [duplicate]

I am creating histograms of data organized in a dataframe and grouped by days. It might happen that in some days the data is identically null. Therefore, when I plot the histogram using the normed = True property, I would expect one single bin centered in zero and with height equal to 1. However, I see that the height is equal to the number of bins. How can I fix this? I want to represent a probability density function with the histogram, so the maximum value should be 1.

Code sample and output:

plt.rcParams['figure.figsize'] = 10, 4
data = np.zeros((1000))
l = plt.hist(data,normed = True, bins = 100)

enter image description here

EDIT: I saw now that the property normed is deprecated. However, if I try to use the attribute density, I get the error AttributeError: Unknown property density

like image 817
cholo14 Avatar asked Jan 27 '23 05:01

cholo14


2 Answers

The plot you see is correct because the area under the curve (histogram/bar) should be 1. This is indeed the case in your plot. To highlight this, I create a vertical line at x=0.01 and you will notice that the width of the bar is indeed 0.01. Since the height of the bar is 100, the area is 100 * 0.01 = 1.

plt.rcParams['figure.figsize'] = 10, 4
data = np.zeros((1000))
l = plt.hist(data,normed = True, bins = 100)
plt.axvline(0.01, lw=1)
plt.ylim(0, 150)

The same happens if you use density=True as

l = plt.hist(data,density = True, bins = 100)

enter image description here

Using the suggestion of jdehesa, following works your way

l = plt.hist(data,density = True, bins=np.arange(-10, 11))

enter image description here

Using the suggestion of DavidG based on this answer gives you a height of 1 but the area is not normalized to 1.

weights = np.ones_like(data)/float(len(data))
l = plt.hist(data,weights=weights)

enter image description here

Finally, if you need a height of 1 and a width of 1 (hence area = 1) and also the normalized area, you can use a single bin as

l = plt.hist(data, density=True, bins=1)
plt.xlim(-10, 10)

enter image description here

like image 199
Sheldore Avatar answered Jan 29 '23 19:01

Sheldore


As other have explained, normed=True (or density=True in recent versions of Matplotlib) makes the area under the histogram equal to 1. You can get a histogram that represents the fraction of the sample falling on each bin like this:

import matplotlib.pyplot as plt
import numpy as np

data = np.zeros((1000))
# Compute histogram
hist, bins = np.histogram(data, density=True, bins=100)
# Width of each bin
bins_w = np.diff(bins)
# Compute proportion of sample in each bin
hist_p = hist * bins_w
# Plot histogram
plt.bar(bins[:-1], hist_p, width=bins_w, align='edge')

Result:

Histogram of fractions

You could also make a histogram where each bin has a width of 1, but that is a more limited solution.

EDIT: As pointed out in other answers, this is basically equivalent to giving the proper weights parameter to plt.hist.

like image 20
jdehesa Avatar answered Jan 29 '23 20:01

jdehesa