Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In scipy.stats rv_continuous has a fit method to find MLEs, but rv_discrete does not. Why?

I would like to find the Maximum Likelihood Estimator for some data that may be governed by a discrete distribution. But in scipy.stats only classes representing continuous distributions have a fit function to do that. What is the reason that the classes representing discrete distributions do not?

like image 731
Keith Braithwaite Avatar asked May 08 '13 22:05

Keith Braithwaite


People also ask

How does SciPy fit distribution?

SciPy performs parameter estimation using MLE (documentation). When you fit a certain probability distribution to your data, you must then test the goodness of fit. Kolmogorov–Smirnov test is an option and the widely used one.

What is RVS in SciPy stats?

Random variates of given type. The shape parameter(s) for the distribution (see docstring of the instance object for more information).

What is loc parameter in SciPy?

The location ( loc ) keyword specifies the mean. The scale ( scale ) keyword specifies the standard deviation. As an instance of the rv_continuous class, norm object inherits from it a collection of generic methods (see below for the full list), and completes them with details specific for this particular distribution.

What is SciPy stats in Python?

stats ) This module contains a large number of probability distributions, summary and frequency statistics, correlation functions and statistical tests, masked statistics, kernel density estimation, quasi-Monte Carlo functionality, and more.


1 Answers

Short answer: because nobody wrote the code for it, or even tried, as far as I know.

Longer answer: I don't know how far we can get with the discrete models with a generic maximum likelihood method as ther is for the continuous distributions, which works for many but not all of those.

Most discrete distributions have strong restrictions on the parameters, and most likely most of them will need a fit methods specific to the distribution

>>> [(f, getattr(stats, f).shapes) for f in dir(stats) if isinstance(getattr(stats, f), stats.distributions.rv_discrete)]
[('bernoulli', 'pr'), ('binom', 'n, pr'), ('boltzmann', 'lamda, N'), 
 ('dlaplace', 'a'), ('geom', 'pr'), ('hypergeom', 'M, n, N'), 
 ('logser', 'pr'), ('nbinom', 'n, pr'), ('planck', 'lamda'), 
 ('poisson', 'mu'), ('randint', 'min, max'), ('skellam', 'mu1,mu2'), 
 ('zipf', 'a')]

statsmodels is providing a few of the discrete models where the parameters can also depend on some explanatory variables. Most of those, like generalized linear models, need a link function to restrict the values for the parameters to the valid range, for example interval (0, 1) for probabilities, or larger than zero for parameters in count models.

Then "n" parameter in binomial and some of the other ones are required to be integers, which makes it impossible to use the usual continuous minimizers from scipy.optimize.

A good solution would be for someone to add distribution specific fit methods, so that we have at least the easier ones available.

like image 123
Josef Avatar answered Oct 16 '22 16:10

Josef