Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scipy - Stats - Meaning of parameters for probability distributions

Scipy docs give the distribution form used by exponential as:

expon.pdf(x) = lambda * exp(- lambda*x)

However the fit function takes :

fit(data, loc=0, scale=1)

And the rvs function takes:

rvs(loc=0, scale=1, size=1)

Question 1: Why the extraneous location variable? I know that exponentials are just specific forms of a more general distribution (gamma) but why include the uneeded information? Even gamma doesn't have a location parameter.

Question 2: Is the out put of the fit(...) in the same order as the input variable. By that I mean If I do :

t = fit([....]) , t will have the form t[0], t[1]

Should I interpret t[0] as the shape and t1 as the scale.

Does this hold for all the distributions?

What about for gamma :

fit(data, a, loc=0, scale=1)
like image 924
bearrito Avatar asked Jul 23 '13 15:07

bearrito


1 Answers

  1. Every univariate probability distribution, no matter what its usual formulation, can be extended to include a location and scale parameter. Sometimes, this entails extending the support of the distribution from just the positive/non-negative reals to the whole real number line with just a PDF value of 0 when below the loc value. scipy.stats does this to move all of the handling of loc and scale to a common method shared by all distributions. It has been suggested to remove this, and make distributions like gamma loc-less to follow their canonical formulations. However, it turns out that some people do actually use "shifted gamma" distributions with nonzero loc parameters to model the sizes of sunspots, if I remember correctly, and the current behavior of scipy.stats was perfect for them. So we're keeping it.

  2. The output of the fit() method is a tuple of the form (shape0, shape1, ..., shapeN, loc, scale) if there are N shape parameters. For a normal distribution, which has no shape parameters, it will return just (loc, scale). For a gamma distribution, which has one, it will return (shape, loc, scale). Multiple shape parameters will be in the same order that you give to every other method on the distribution. This holds for all distributions.

like image 124
Robert Kern Avatar answered Oct 13 '22 09:10

Robert Kern