Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fitting a Weibull distribution using Scipy

I am trying to recreate maximum likelihood distribution fitting, I can already do this in Matlab and R, but now I want to use scipy. In particular, I would like to estimate the Weibull distribution parameters for my data set.

I have tried this:

import scipy.stats as s import numpy as np import matplotlib.pyplot as plt  def weib(x,n,a):     return (a / n) * (x / n)**(a - 1) * np.exp(-(x / n)**a)  data = np.loadtxt("stack_data.csv")  (loc, scale) = s.exponweib.fit_loc_scale(data, 1, 1) print loc, scale  x = np.linspace(data.min(), data.max(), 1000) plt.plot(x, weib(x, loc, scale)) plt.hist(data, data.max(), density=True) plt.show() 

And get this:

(2.5827280639441961, 3.4955032285727947) 

And a distribution that looks like this:

Weibull distribution using Scipy

I have been using the exponweib after reading this http://www.johndcook.com/distributions_scipy.html. I have also tried the other Weibull functions in scipy (just in case!).

In Matlab (using the Distribution Fitting Tool - see screenshot) and in R (using both the MASS library function fitdistr and the GAMLSS package) I get a (loc) and b (scale) parameters more like 1.58463497 5.93030013. I believe all three methods use the maximum likelihood method for distribution fitting.

Weibull distribution using Matlab

I have posted my data here if you would like to have a go! And for completeness I am using Python 2.7.5, Scipy 0.12.0, R 2.15.2 and Matlab 2012b.

Why am I getting a different result!?

like image 261
kungphil Avatar asked Jul 05 '13 05:07

kungphil


People also ask

How do you fit a Weibull distribution?

Fit Weibull Models InteractivelyIn the Curve Fitter app, select curve data. On the Curve Fitter tab, in the Data section, click Select Data. In the Select Fitting Data dialog box, select X Data and Y Data, or just Y Data against an index.

How do you calculate Weibull parameters?

There are many methods to calculate and estimate the power produced from wind at a specific location, but the accurate one vary depending on the parameters used in the process. It can be given by Weibull parameters as; (2) v ¯ = c Γ ( 1 + 1 k ) where n is the number of wind data, and is the wind speed.


1 Answers

My guess is that you want to estimate the shape parameter and the scale of the Weibull distribution while keeping the location fixed. Fixing loc assumes that the values of your data and of the distribution are positive with lower bound at zero.

floc=0 keeps the location fixed at zero, f0=1 keeps the first shape parameter of the exponential weibull fixed at one.

>>> stats.exponweib.fit(data, floc=0, f0=1) [1, 1.8553346917584836, 0, 6.8820748596850905] >>> stats.weibull_min.fit(data, floc=0) [1.8553346917584836, 0, 6.8820748596850549] 

The fit compared to the histogram looks ok, but not very good. The parameter estimates are a bit higher than the ones you mention are from R and matlab.

Update

The closest I can get to the plot that is now available is with unrestricted fit, but using starting values. The plot is still less peaked. Note values in fit that don't have an f in front are used as starting values.

>>> from scipy import stats >>> import matplotlib.pyplot as plt >>> plt.plot(data, stats.exponweib.pdf(data, *stats.exponweib.fit(data, 1, 1, scale=02, loc=0))) >>> _ = plt.hist(data, bins=np.linspace(0, 16, 33), normed=True, alpha=0.5); >>> plt.show() 

exponweib fit

like image 107
Josef Avatar answered Oct 05 '22 22:10

Josef