Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to plot normalized histogram with pdf properly using matplotlib?

I try to plot normalized histogram using example from numpy.random.normal documentation. For this purpose I generate normally distributed random sample.

mu_true = 0
sigma_true = 0.1 
s = np.random.normal(mu_true, sigma_true, 2000)

Then I fitt normal distribution to the data and calculate pdf.

mu, sigma = stats.norm.fit(s)
points = np.linspace(stats.norm.ppf(0.01,loc=mu,scale=sigma),
                 stats.norm.ppf(0.9999,loc=mu,scale=sigma),100)
pdf = stats.norm.pdf(points,loc=mu,scale=sigma)

Display fitted pdf and data histogram.

plt.hist(s, 30, density=True);
plt.plot(points, pdf, color='r')
plt.show() 

I use density=True, but it is obviously, that pdf and histogram are not normalized.

enter image description here

What can one suggests to plot truly normalized histogram and pdf?

Seaborn distplot also doesn't solve the problem.

import seaborn as sns
ax = sns.distplot(s)

enter image description here

like image 911
Einar A Avatar asked Sep 20 '18 11:09

Einar A


2 Answers

What makes you think it is not normalised? At a guess, it's probably because the heights of each column extend to values greater than 1. However, this thinking is flawed because in a normalised histogram/pdf, the total area under it should sum to one (not the heights). When you are dealing with small steps in x (as you are), that are less than one, then it is not surprising that the column heights are greater than one!

You can see this clearly in the scipy example you link: the x-values are much greater (by an order of magnitude) so it follows that their y-values are also smaller. You will see the same effect if you change your distribution to cover a wider range of values. Try a sigma of 10 instead of 0.1, see what happens!

like image 155
Jgd Avatar answered Nov 01 '22 14:11

Jgd


import numpy as np
from numpy.random import seed, randn
from scipy.stats import norm
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme()

"Try this, for 𝜇 = 0"
seed(0)
points = np.linspace(-5,5,100)
pdf    = norm.pdf(points,0,1)
plt.plot(points, pdf, color='r')
plt.hist(randn(50), density=True);
plt.show() 

enter image description here

"or this, for 𝜇 = 10"
seed(0)
points = np.linspace(5,15,100)
pdf    = norm.pdf(points,10,1)
plt.plot(points, pdf, color='r')
plt.hist(10+randn(50), density=True);
plt.show() 

enter image description here

like image 23
Muhammad Syamsuddin Avatar answered Nov 01 '22 16:11

Muhammad Syamsuddin