Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Visualize a normal curve on data's histogram

Tags:

python

pandas

Thanks in advance for any assistance or tips.

I'm trying to visualize the fitted normal to one of my dataframe's column. So far, I've been able to plot the histogram by:

df.radon_adj.hist(bins=30)

hist

I've this 'template', but I encounter errors.

import pylab as py
import numpy as np
from scipy import optimize

# Generate a 
y = df.radon_adj
data = py.hist(y, bins = 25)

# Equation for Gaussian
def f(x, a, b, c):
    return a * py.exp(-(x - b)**2.0 / (2 * c**2))

# Generate data from bins as a set of points 
x = [0.5 * (data[1][i] + data[1][i+1]) for i in xrange(len(data[1])-1)]
y = data[0]

popt, pcov = optimize.curve_fit(f, x, y)

x_fit = py.linspace(x[0], x[-1], 100)
y_fit = f(x_fit, *popt)

plot(x_fit, y_fit, lw=4, color="r")
like image 325
HolaGonzalo Avatar asked Nov 24 '14 22:11

HolaGonzalo


1 Answers

I wouldn't reinvent the wheel by defining the equation for Gaussian. Stand on the shoulders of the scipy package:

from scipy.stats import norm  
df = pd.DataFrame({'A': np.random.normal(size=100)})

df.A.plot(kind='hist', normed=True)

range = np.arange(-4, 4, 0.001)
plt.plot(range, norm.pdf(range,0,1))

enter image description here

Note that the only 'magic' here is making sure the histogram is normed.

like image 172
JD Long Avatar answered Nov 02 '22 15:11

JD Long