I have a dataset that I would like to fit to a known probability distribution. The intention is to use the fitted PDF in a data generator - such that I can sample data from the known (fitted) PDF. Data will be used for simulation purposes. At the moment I am just sampling from a normal distribution, which is inconsistent with the real-data, therefore simulation results are not accurate. I first wanted to use the following method : Fitting empirical distribution to theoretical ones with Scipy (Python)? My first thought was to fit it to a weibull distribution, but the data is actually multimodal (picture attached). So I guess I need to combine multiple distributions and then fit the data to the resulting dist, is that right ? Maybe combine a gaussian AND a weibull distirbution ? How can I use the scipy fit() function with a mixed/multimodal distribution ? Also I would want to do this in Python (i.e. scipy/numpy/matplotlib), as the data generator is written in Python. Many thanks ! <img src="https://i.stack.imgur.com/BzwGN.png" alt="histogram of data">

I would suggest Kernel Density Estimation (KDE). It gives you a solution as a mixture of PDF. SciPy has only Gaussian kernel (which lookes fine for your specific histogram), but you can find other kernels in the <code>statsmodels</code> or <code>scikit-learn</code> packages. For reference, those are the relevant functions: <pre class="prettyprint lang-py prettyprint-override"><code>from sklearn.neighbors import KernelDensity from scipy.stats import gaussian_kde from statsmodels.nonparametric.kde import KDEUnivariate from statsmodels.nonparametric.kernel_density import KDEMultivariate </code></pre> A great resource for KDE in Python is here.

Fitting data to multimodal distributions with scipy, matplotlib

Tags:

python

matplotlib

scipy

distribution

weibull

I have a dataset that I would like to fit to a known probability distribution. The intention is to use the fitted PDF in a data generator - such that I can sample data from the known (fitted) PDF. Data will be used for simulation purposes. At the moment I am just sampling from a normal distribution, which is inconsistent with the real-data, therefore simulation results are not accurate.

I first wanted to use the following method : Fitting empirical distribution to theoretical ones with Scipy (Python)?

My first thought was to fit it to a weibull distribution, but the data is actually multimodal (picture attached). So I guess I need to combine multiple distributions and then fit the data to the resulting dist, is that right ? Maybe combine a gaussian AND a weibull distirbution ?

How can I use the scipy fit() function with a mixed/multimodal distribution ?

Also I would want to do this in Python (i.e. scipy/numpy/matplotlib), as the data generator is written in Python.

Many thanks !

histogram of data

593

asked Oct 15 '15 21:10

Rosh

1 Answers

I would suggest Kernel Density Estimation (KDE). It gives you a solution as a mixture of PDF.

SciPy has only Gaussian kernel (which lookes fine for your specific histogram), but you can find other kernels in the statsmodels or scikit-learn packages.

For reference, those are the relevant functions:

from sklearn.neighbors import KernelDensity
from scipy.stats import gaussian_kde
from statsmodels.nonparametric.kde import KDEUnivariate
from statsmodels.nonparametric.kernel_density import KDEMultivariate

A great resource for KDE in Python is here.

111

answered Oct 05 '22 01:10

Elad Joseph

Related questions
                            
                                Django-Filter and query with an array
                            
                                Turn any 2D image into 3D printable sculpture with code
                            
                                Create new ID3 tag using python and eyed3
                            
                                Skip test depending on parameter in py.test
                            
                                Set an optional variable in named tuple
                            
                                Insert a dot at a certain point on a line with matplotlib
                            
                                Why list comprehension is much faster than numpy for multiplying arrays?
                            
                                How to remove DataFrame rows where a column's values are in a set?
                            
                                python socket send immediately
                            
                                python axhline label not showing up in plot
                            
                                Python equivalent of Ruby's .select
                            
                                Apache Thrift Python 3 support
                            
                                What makes an element eligible for a set membership test in Python? [duplicate]
                            
                                What does `{...}` mean in the print output of a python variable?
                            
                                TMUX Session Won't Import Python Module
                            
                                How to get the visual length of a text string in python
                            
                                Correct way of unit testing __repr__ with dict
                            
                                Filter Pandas DataFrame for elements in list [duplicate]
                            
                                How to make a for loop either increasing or decreasing?
                            
                                compare two floats for equality in Python [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With