Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fitting partial Gaussian

I'm trying to fit a sum of gaussians using scikit-learn because the scikit-learn GaussianMixture seems much more robust than using curve_fit.

Problem: It doesn't do a great job in fitting a truncated part of even a single gaussian peak:

from sklearn import mixture
import matplotlib.pyplot
import matplotlib.mlab
import numpy as np

clf = mixture.GaussianMixture(n_components=1, covariance_type='full')
data = np.random.randn(10000)
data = [[x] for x in data]
clf.fit(data)
data = [item for sublist in data for item in sublist]
rangeMin = int(np.floor(np.min(data)))
rangeMax = int(np.ceil(np.max(data)))
h = matplotlib.pyplot.hist(data, range=(rangeMin, rangeMax), normed=True);
plt.plot(np.linspace(rangeMin, rangeMax),
         mlab.normpdf(np.linspace(rangeMin, rangeMax),
                      clf.means_, np.sqrt(clf.covariances_[0]))[0])

gives enter image description here now changing data = [[x] for x in data] to data = [[x] for x in data if x <0] in order to truncate the distribution returns enter image description here Any ideas how to get the truncation fitted properly?

Note: The distribution isn't necessarily truncated in the middle, there could be anything between 50% and 100% of the full distribution left.

I would also be happy if anyone can point me to alternative packages. I've only tried curve_fit but couldn't get it to do anything useful as soon as more than two peaks are involved.

like image 872
lhcgeneva Avatar asked Jan 29 '17 19:01

lhcgeneva


People also ask

How do you fit a Gaussian curve in Python?

It uses non-linear least squares to fit data to a functional form. You can learn more about curve_fit by using the help function within the Jupyter notebook or scipy online documentation. The curve_fit function has three required inputs: the function you want to fit, the x-data, and the y-data you fit.

Why do we use Gaussian fit?

Gaussian functions are widely used in statistics to describe the normal distributions, in signal processing to define Gaussian filters, in image processing where two-dimensional Gaussians are used for Gaussian blurs, and in mathematics to solve heat equations and diffusion equations and to define the Weierstrass ...


1 Answers

A bit brutish, but simple solution would be to split the curve in two halfs (data = [[x] for x in data if x < 0]), mirror the left part (data.append([-data[d][0]])) and then do the regular Gaussian fit.

import numpy as np
from sklearn import mixture
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab

np.random.seed(seed=42)
n = 10000

clf = mixture.GaussianMixture(n_components=1, covariance_type='full')

#split the data and mirror it
data = np.random.randn(n)
data = [[x] for x in data if x < 0]
n = len(data)
for d in range(n):
    data.append([-data[d][0]])

clf.fit(data)
data = [item for sublist in data for item in sublist]
rangeMin = int(np.floor(np.min(data)))
rangeMax = int(np.ceil(np.max(data)))
h = plt.hist(data[0:n], bins=20, range=(rangeMin, rangeMax), normed=True);
plt.plot(np.linspace(rangeMin, rangeMax),
         mlab.normpdf(np.linspace(rangeMin, rangeMax),
                      clf.means_, np.sqrt(clf.covariances_[0]))[0] * 2)

plt.show()

enter image description here

like image 177
Maximilian Peters Avatar answered Oct 02 '22 21:10

Maximilian Peters