Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fitting data to multimodal distributions with scipy, matplotlib

I have a dataset that I would like to fit to a known probability distribution. The intention is to use the fitted PDF in a data generator - such that I can sample data from the known (fitted) PDF. Data will be used for simulation purposes. At the moment I am just sampling from a normal distribution, which is inconsistent with the real-data, therefore simulation results are not accurate.

I first wanted to use the following method : Fitting empirical distribution to theoretical ones with Scipy (Python)?

My first thought was to fit it to a weibull distribution, but the data is actually multimodal (picture attached). So I guess I need to combine multiple distributions and then fit the data to the resulting dist, is that right ? Maybe combine a gaussian AND a weibull distirbution ?

How can I use the scipy fit() function with a mixed/multimodal distribution ?

Also I would want to do this in Python (i.e. scipy/numpy/matplotlib), as the data generator is written in Python.

Many thanks !

histogram of data

like image 593
Rosh Avatar asked Oct 15 '15 21:10

Rosh


People also ask

How does SciPy fit distribution?

SciPy performs parameter estimation using MLE (documentation). When you fit a certain probability distribution to your data, you must then test the goodness of fit. Kolmogorov–Smirnov test is an option and the widely used one.

What is loc parameter in SciPy?

The location ( loc ) keyword specifies the mean. The scale ( scale ) keyword specifies the standard deviation. As an instance of the rv_continuous class, norm object inherits from it a collection of generic methods (see below for the full list), and completes them with details specific for this particular distribution.


1 Answers

I would suggest Kernel Density Estimation (KDE). It gives you a solution as a mixture of PDF.

SciPy has only Gaussian kernel (which lookes fine for your specific histogram), but you can find other kernels in the statsmodels or scikit-learn packages.

For reference, those are the relevant functions:

from sklearn.neighbors import KernelDensity
from scipy.stats import gaussian_kde
from statsmodels.nonparametric.kde import KDEUnivariate
from statsmodels.nonparametric.kernel_density import KDEMultivariate

A great resource for KDE in Python is here.

like image 111
Elad Joseph Avatar answered Oct 05 '22 01:10

Elad Joseph