I would like to find the Maximum Likelihood Estimator for some data that may be governed by a discrete distribution. But in scipy.stats only classes representing continuous distributions have a fit function to do that. What is the reason that the classes representing discrete distributions do not?
SciPy performs parameter estimation using MLE (documentation). When you fit a certain probability distribution to your data, you must then test the goodness of fit. Kolmogorov–Smirnov test is an option and the widely used one.
Random variates of given type. The shape parameter(s) for the distribution (see docstring of the instance object for more information).
The location ( loc ) keyword specifies the mean. The scale ( scale ) keyword specifies the standard deviation. As an instance of the rv_continuous class, norm object inherits from it a collection of generic methods (see below for the full list), and completes them with details specific for this particular distribution.
stats ) This module contains a large number of probability distributions, summary and frequency statistics, correlation functions and statistical tests, masked statistics, kernel density estimation, quasi-Monte Carlo functionality, and more.
Short answer: because nobody wrote the code for it, or even tried, as far as I know.
Longer answer: I don't know how far we can get with the discrete models with a generic maximum likelihood method as ther is for the continuous distributions, which works for many but not all of those.
Most discrete distributions have strong restrictions on the parameters, and most likely most of them will need a fit methods specific to the distribution
>>> [(f, getattr(stats, f).shapes) for f in dir(stats) if isinstance(getattr(stats, f), stats.distributions.rv_discrete)]
[('bernoulli', 'pr'), ('binom', 'n, pr'), ('boltzmann', 'lamda, N'),
('dlaplace', 'a'), ('geom', 'pr'), ('hypergeom', 'M, n, N'),
('logser', 'pr'), ('nbinom', 'n, pr'), ('planck', 'lamda'),
('poisson', 'mu'), ('randint', 'min, max'), ('skellam', 'mu1,mu2'),
('zipf', 'a')]
statsmodels is providing a few of the discrete models where the parameters can also depend on some explanatory variables. Most of those, like generalized linear models, need a link function to restrict the values for the parameters to the valid range, for example interval (0, 1) for probabilities, or larger than zero for parameters in count models.
Then "n" parameter in binomial and some of the other ones are required to be integers, which makes it impossible to use the usual continuous minimizers from scipy.optimize.
A good solution would be for someone to add distribution specific fit methods, so that we have at least the easier ones available.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With