Im generating a random sample of data and plotting its pdf using scipy.stats.norm.fit to generate my loc and scale parameters.
I wanted to see how different my pdf would look like if I just calculated the mean and std using numpy without any actual fitting. To my surprise when I plot both pdfs and print both sets of mu and std the results I get are exactly the same. So my question is, what is the point of norm.fit if I can just calculate the mean and std of my sample and still get the same results?
This is my code:
import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt
data = norm.rvs(loc=0,scale=1,size=200)
mu1 = np.mean(data)
std1 = np.std(data)
print(mu1)
print(std1)
mu, std = norm.fit(data)
plt.hist(data, bins=25, density=True, alpha=0.6, color='g')
xmin, xmax = plt.xlim()
x = np.linspace(xmin, xmax, 100)
p = norm.pdf(x, mu, std)
q = norm.pdf(x, mu1, std1)
plt.plot(x, p, 'k', linewidth=2)
plt.plot(x, q, 'r', linewidth=1)
title = "Fit results: mu = %.5f, std = %.5f" % (mu, std)
plt.title(title)
plt.show()
And this is the results I got:
Pdf of a random set of values
mu1 = 0.034824979915482716
std1 = 0.9945453455908072
A normal continuous random variable. The location ( loc ) keyword specifies the mean.
The method norm. ppf() takes a percentage and returns a standard deviation multiplier for what value that percentage occurs at. It is equivalent to a, 'One-tail test' on the density plot. From scipy. stats.
The location (loc) keyword specifies the mean. The scale (scale) keyword specifies the standard deviation. Frozen RV object with the same methods but holding the given shape, location, and scale fixed.
The easiest way to calculate normal CDF probabilities in Python is to use the norm. cdf() function from the SciPy library. What is this? The probability that a random variables takes on a value less than 1.96 in a standard normal distribution is roughly 0.975.
The point is that there are several other distributions out there besides the normal distribution. Scipy provides a consistent API for learning the parameters of these distributions from data. (Want an exponential distribution instead of a normal distribution? It’s scipy.stats.expon.fit
.)
So sure, your way also works because the parameters of the normal distribution happen to be the mean and standard deviation. But this is about providing a consistent interface across distributions, including ones where that’s not true.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With