I am writing a script to find the best-fitting distribution over a dataset using scipy.stats. I first have a list of distribution names, over which I iterate:
dists = ['alpha', 'anglit', 'arcsine', 'beta', 'betaprime', 'bradford', 'norm']
for d in dists:
dist = getattr(scipy.stats, d)
ps = dist.fit(selected_data)
errors.loc[d,['D-Value','P-Value']] = kstest(selected.tolist(), d, args=ps)
errors.loc[d,'Params'] = ps
Now, after this loop, I select the minimum D-Value in order to get the best fitting distribution. Now, each distribution returns a specific set of parameters in ps, each with their names and so on (for instance, for 'alpha' it would be alpha, whereas for 'norm' they would be mean and std).
Is there a way to get the names of the estimated parameters in scipy.stats?
Thank you in advance
Warren Weckesser and I have developed a more robust solution:
import sys
import scipy.stats
def list_parameters(distribution):
"""List parameters for scipy.stats.distribution.
# Arguments
distribution: a string or scipy.stats distribution object.
# Returns
A list of distribution parameter strings.
"""
if isinstance(distribution, str):
distribution = getattr(scipy.stats, distribution)
if distribution.shapes:
parameters = [name.strip() for name in distribution.shapes.split(',')]
else:
parameters = []
if distribution.name in scipy.stats._discrete_distns._distn_names:
parameters += ['loc']
elif distribution.name in scipy.stats._continuous_distns._distn_names:
parameters += ['loc', 'scale']
else:
sys.exit("Distribution name not found in discrete or continuous lists.")
return parameters
The discussion can be found here.
This code demonstrates the information that ev-br gave in his answer in case anyone else lands here.
>>> from scipy import stats
>>> dists = ['alpha', 'anglit', 'arcsine', 'beta', 'betaprime', 'bradford', 'norm']
>>> for d in dists:
... dist = getattr(scipy.stats, d)
... dist.name, dist.shapes
...
('alpha', 'a')
('anglit', None)
('arcsine', None)
('beta', 'a, b')
('betaprime', 'a, b')
('bradford', 'c')
('norm', None)
I would point out that the shapes parameter yields a value of None for distributions such as the normal which are parameterised by location and scale.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With