Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A lognormal distribution in python

I have seen several questions in stackoverflow regarding how to fit a log-normal distribution. Still there are two clarifications that I need known.

I have a sample data, the logarithm of which follows a normal distribution. So I can fit the data using scipy.stats.lognorm.fit (i.e a log-normal distribution)

The fit is working fine, and also gives me the standard deviation. Here is my piece of code with the results.

sample = np.log10(data) #taking the log10 of the data

scatter,loc,mean = stats.lognorm.fit(sample) #Gives the paramters of the fit 

x_fit = np.linspace(13.0,15.0,100)
pdf_fitted = stats.lognorm.pdf(x_fit,scatter,loc,mean) #Gives the PDF

print "scatter for data is %s" %scatter
print "mean of data is %s" %mean  

enter image description hereTHE RESULT

scatter for data is 0.186415047243
mean for data is 1.15731050926

From the image you can clearly see that the mean is around 14.2, but what I get is 1.15??!! Why is this so? clearly the log(mean) is also not near 14.2!!

In THIS POST and in THIS QUESTION it is mentioned that the log(mean) is the actual mean.

But you can see from my above code, the fit that I have obtained is using a the sample = log(data) and it also seems to fit well. However when I tried

sample = data
pdf_fitted = stats.lognorm.pdf(x_fit,scatter,loc,np.log10(mean))

The fit does not seem to work.

1) Why is the mean not 14.2?

2) How to draw fill/draw vertical lines showing the 1 sigma confidence region?

like image 384
Srivatsan Avatar asked Oct 16 '14 13:10

Srivatsan


1 Answers

You say

I have a sample data, the logarithm of which follows a normal distribution.

Suppose data is the array containing the samples. To fit this data to a log-normal distribution using scipy.stats.lognorm, use:

s, loc, scale = stats.lognorm.fit(data, floc=0)

Now suppose mu and sigma are the mean and standard deviation of the underlying normal distribution. To get the estimate of those values from this fit, use:

estimated_mu = np.log(scale)
estimated_sigma = s

(These are not the estimates of the mean and standard deviation of the samples in data. See the wikipedia page for the formulas for the mean and variance of a log-normal distribution in terms of mu and sigma.)

To combine the histogram and the PDF, you can use, for example,

import matplotlib.pyplot as plt.

plt.hist(data, bins=50, normed=True, color='c', alpha=0.75)
xmin = data.min()
xmax = data.max()
x = np.linspace(xmin, xmax, 100)
pdf = stats.lognorm.pdf(x, s, scale=scale)
plt.plot(x, pdf, 'k')

If you want to see the log of the data, you could do something like the following. Note the the PDF of the normal distribution is used here.

logdata = np.log(data)
plt.hist(logdata, bins=40, normed=True, color='c', alpha=0.75)
xmin = logdata.min()
xmax = logdata.max()
x = np.linspace(xmin, xmax, 100)
pdf = stats.norm.pdf(x, loc=estimated_mu, scale=estimated_sigma)
plt.plot(x, pdf, 'k')

By the way, an alternative to fitting with stats.lognorm is to fit log(data) using stats.norm.fit:

logdata = np.log(data)
estimated_mu, estimated_sigma = stats.norm.fit(logdata)

Related questions:

  • Fitting lognormal distribution using Scipy vs Matlab
  • Lognormal Random Numbers Centered around a high value
like image 87
Warren Weckesser Avatar answered Sep 30 '22 19:09

Warren Weckesser