I have seen several questions in stackoverflow regarding how to fit a log-normal distribution
. Still there are two clarifications that I need known.
I have a sample data, the logarithm of which follows a normal distribution. So I can fit the data using scipy.stats.lognorm.fit
(i.e a log-normal distribution
)
The fit is working fine, and also gives me the standard deviation. Here is my piece of code with the results.
sample = np.log10(data) #taking the log10 of the data
scatter,loc,mean = stats.lognorm.fit(sample) #Gives the paramters of the fit
x_fit = np.linspace(13.0,15.0,100)
pdf_fitted = stats.lognorm.pdf(x_fit,scatter,loc,mean) #Gives the PDF
print "scatter for data is %s" %scatter
print "mean of data is %s" %mean
THE RESULT
scatter for data is 0.186415047243
mean for data is 1.15731050926
From the image you can clearly see that the mean is around 14.2, but what I get is 1.15??!!
Why is this so? clearly the log(mean) is also not near 14.2!!
In THIS POST and in THIS QUESTION it is mentioned that the log(mean)
is the actual mean.
But you can see from my above code, the fit that I have obtained is using a the sample = log(data)
and it also seems to fit well. However when I tried
sample = data
pdf_fitted = stats.lognorm.pdf(x_fit,scatter,loc,np.log10(mean))
The fit does not seem to work.
1) Why is the mean not 14.2?
2) How to draw fill/draw vertical lines showing the 1 sigma confidence region?
You say
I have a sample data, the logarithm of which follows a normal distribution.
Suppose data
is the array containing the samples. To fit this data to
a log-normal distribution using scipy.stats.lognorm
, use:
s, loc, scale = stats.lognorm.fit(data, floc=0)
Now suppose mu and sigma are the mean and standard deviation of the underlying normal distribution. To get the estimate of those values from this fit, use:
estimated_mu = np.log(scale)
estimated_sigma = s
(These are not the estimates of the mean and standard deviation of
the samples in data
. See the wikipedia page for the formulas
for the mean and variance of a log-normal distribution in terms of mu and sigma.)
To combine the histogram and the PDF, you can use, for example,
import matplotlib.pyplot as plt.
plt.hist(data, bins=50, normed=True, color='c', alpha=0.75)
xmin = data.min()
xmax = data.max()
x = np.linspace(xmin, xmax, 100)
pdf = stats.lognorm.pdf(x, s, scale=scale)
plt.plot(x, pdf, 'k')
If you want to see the log of the data, you could do something like the following. Note the the PDF of the normal distribution is used here.
logdata = np.log(data)
plt.hist(logdata, bins=40, normed=True, color='c', alpha=0.75)
xmin = logdata.min()
xmax = logdata.max()
x = np.linspace(xmin, xmax, 100)
pdf = stats.norm.pdf(x, loc=estimated_mu, scale=estimated_sigma)
plt.plot(x, pdf, 'k')
By the way, an alternative to fitting with stats.lognorm
is to fit log(data)
using stats.norm.fit
:
logdata = np.log(data)
estimated_mu, estimated_sigma = stats.norm.fit(logdata)
Related questions:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With