Fitting partial Gaussian

Q: How do you fit a Gaussian curve in Python?

It uses non-linear least squares to fit data to a functional form. You can learn more about curve_fit by using the help function within the Jupyter notebook or scipy online documentation. The curve_fit function has three required inputs: the function you want to fit, the x-data, and the y-data you fit.

Tags:

numpy

scipy

scikit-learn

curve-fitting

gaussian

I'm trying to fit a sum of gaussians using scikit-learn because the scikit-learn GaussianMixture seems much more robust than using curve_fit.

Problem: It doesn't do a great job in fitting a truncated part of even a single gaussian peak:

from sklearn import mixture
import matplotlib.pyplot
import matplotlib.mlab
import numpy as np

clf = mixture.GaussianMixture(n_components=1, covariance_type='full')
data = np.random.randn(10000)
data = [[x] for x in data]
clf.fit(data)
data = [item for sublist in data for item in sublist]
rangeMin = int(np.floor(np.min(data)))
rangeMax = int(np.ceil(np.max(data)))
h = matplotlib.pyplot.hist(data, range=(rangeMin, rangeMax), normed=True);
plt.plot(np.linspace(rangeMin, rangeMax),
         mlab.normpdf(np.linspace(rangeMin, rangeMax),
                      clf.means_, np.sqrt(clf.covariances_[0]))[0])

gives enter image description here now changing data = [[x] for x in data] to data = [[x] for x in data if x <0] in order to truncate the distribution returns Any ideas how to get the truncation fitted properly?

Note: The distribution isn't necessarily truncated in the middle, there could be anything between 50% and 100% of the full distribution left.

I would also be happy if anyone can point me to alternative packages. I've only tried curve_fit but couldn't get it to do anything useful as soon as more than two peaks are involved.

872

asked Jan 29 '17 19:01

lhcgeneva

1 Answers

A bit brutish, but simple solution would be to split the curve in two halfs (data = [[x] for x in data if x < 0]), mirror the left part (data.append([-data[d][0]])) and then do the regular Gaussian fit.

import numpy as np
from sklearn import mixture
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab

np.random.seed(seed=42)
n = 10000

clf = mixture.GaussianMixture(n_components=1, covariance_type='full')

#split the data and mirror it
data = np.random.randn(n)
data = [[x] for x in data if x < 0]
n = len(data)
for d in range(n):
    data.append([-data[d][0]])

clf.fit(data)
data = [item for sublist in data for item in sublist]
rangeMin = int(np.floor(np.min(data)))
rangeMax = int(np.ceil(np.max(data)))
h = plt.hist(data[0:n], bins=20, range=(rangeMin, rangeMax), normed=True);
plt.plot(np.linspace(rangeMin, rangeMax),
         mlab.normpdf(np.linspace(rangeMin, rangeMax),
                      clf.means_, np.sqrt(clf.covariances_[0]))[0] * 2)

plt.show()

enter image description here

177

answered Oct 02 '22 21:10

Maximilian Peters

Related questions
                            
                                Memory allocated to Python is not released back in Linux even after gc.collect()
                            
                                Python Pandas: can't find numpy.core.multiarray when importing pandas
                            
                                Numpy Install Mac Osx Python
                            
                                Encoding custom python objects as BSON with pymongo
                            
                                stratified sampling in numpy
                            
                                NumPy and SciPy. Static vs Dynamic loading
                            
                                Derived class from numpy array does not play well with matrix and masked array
                            
                                Reduce memory footprint of python program
                            
                                Monotonically decreasing curve fit using Python
                            
                                Correct way to do operations on Memmapped arrays
                            
                                Python gzip: OverflowError size does not fit in an int
                            
                                Pandas: why pandas.Series.std() is quite different from numpy.std()
                            
                                Convert python modules into DLL file
                            
                                pandas extrapolation of polynomial
                            
                                How can I compute the null space/kernel (x: M·x = 0) of a sparse matrix in Python?
                            
                                Cleanest way to set xtickslabel in specific position
                            
                                mpi4py scatter and gather with large numpy arrays
                            
                                How to get a Python long double literal?
                            
                                Improving performance of numpy mapping operation
                            
                                Generate N positive integers within a range adding up to a total in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With