Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting the parameter names of scipy.stats distributions

I am writing a script to find the best-fitting distribution over a dataset using scipy.stats. I first have a list of distribution names, over which I iterate:

dists = ['alpha', 'anglit', 'arcsine', 'beta', 'betaprime', 'bradford', 'norm']
for d in dists:
    dist = getattr(scipy.stats, d)
    ps = dist.fit(selected_data)
    errors.loc[d,['D-Value','P-Value']] = kstest(selected.tolist(), d, args=ps)
    errors.loc[d,'Params'] = ps

Now, after this loop, I select the minimum D-Value in order to get the best fitting distribution. Now, each distribution returns a specific set of parameters in ps, each with their names and so on (for instance, for 'alpha' it would be alpha, whereas for 'norm' they would be mean and std).

Is there a way to get the names of the estimated parameters in scipy.stats?

Thank you in advance

like image 318
user1695639 Avatar asked May 26 '15 08:05

user1695639


2 Answers

Warren Weckesser and I have developed a more robust solution:

import sys
import scipy.stats

def list_parameters(distribution):
    """List parameters for scipy.stats.distribution.
    # Arguments
        distribution: a string or scipy.stats distribution object.
    # Returns
        A list of distribution parameter strings.
    """
    if isinstance(distribution, str):
        distribution = getattr(scipy.stats, distribution)
    if distribution.shapes:
        parameters = [name.strip() for name in distribution.shapes.split(',')]
    else:
        parameters = []
    if distribution.name in scipy.stats._discrete_distns._distn_names:
        parameters += ['loc']
    elif distribution.name in scipy.stats._continuous_distns._distn_names:
        parameters += ['loc', 'scale']
    else:
        sys.exit("Distribution name not found in discrete or continuous lists.")
    return parameters

The discussion can be found here.

like image 184
Adam Erickson Avatar answered Sep 24 '22 13:09

Adam Erickson


This code demonstrates the information that ev-br gave in his answer in case anyone else lands here.

>>> from scipy import stats
>>> dists = ['alpha', 'anglit', 'arcsine', 'beta', 'betaprime', 'bradford', 'norm']
>>> for d in dists:
...     dist = getattr(scipy.stats, d)
...     dist.name, dist.shapes
... 
('alpha', 'a')
('anglit', None)
('arcsine', None)
('beta', 'a, b')
('betaprime', 'a, b')
('bradford', 'c')
('norm', None)

I would point out that the shapes parameter yields a value of None for distributions such as the normal which are parameterised by location and scale.

like image 24
Bill Bell Avatar answered Sep 20 '22 13:09

Bill Bell