Understanding Gaussian Mixture Models

Tags:

Here I first generate a sample distribution constructed from gaussians, then fit a gaussian mixture model to these data. Next, I want to calculate the probability for some given input. Conveniently, the scikit implementation offer the score_samples method to do just that. Now I am trying to understand these results. I always thought, that I can just take the parameters of the gaussians from the GMM fit and construct the very same distribution by summing over them and then normalising the integral to 1. However, as you can see in the plot, the samples drawn from the score_samples method fit perfectly (red line) to the original data (blue histogram), the manually constructed distribution (black line) does not. I would like to understand where my thinking went wrong and why I can't construct the distribution myself by summing the gaussians as given by the GMM fit!?! Thanks a lot for any input!

909

asked Jan 13 '17 09:01

HansSnah

1 Answers

Just in case anyone in the future is wondering about the same thing: One has to normalise the individual components, not the sum:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.mixture import GaussianMixture

# Define simple gaussian
def gauss_function(x, amp, x0, sigma):
    return amp * np.exp(-(x - x0) ** 2. / (2. * sigma ** 2.))

# Generate sample from three gaussian distributions
samples = np.random.normal(-0.5, 0.2, 2000)
samples = np.append(samples, np.random.normal(-0.1, 0.07, 5000))
samples = np.append(samples, np.random.normal(0.2, 0.13, 10000))

# Fit GMM
gmm = GaussianMixture(n_components=3, covariance_type="full", tol=0.001)
gmm = gmm.fit(X=np.expand_dims(samples, 1))

# Evaluate GMM
gmm_x = np.linspace(-2, 1.5, 5000)
gmm_y = np.exp(gmm.score_samples(gmm_x.reshape(-1, 1)))

# Construct function manually as sum of gaussians
gmm_y_sum = np.full_like(gmm_x, fill_value=0, dtype=np.float32)
for m, c, w in zip(gmm.means_.ravel(), gmm.covariances_.ravel(), gmm.weights_.ravel()):
    gauss = gauss_function(x=gmm_x, amp=1, x0=m, sigma=np.sqrt(c))
    gmm_y_sum += gauss / np.trapz(gauss, gmm_x) * w

# Make regular histogram
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=[8, 5])
ax.hist(samples, bins=50, normed=True, alpha=0.5, color="#0070FF")
ax.plot(gmm_x, gmm_y, color="crimson", lw=4, label="GMM")
ax.plot(gmm_x, gmm_y_sum, color="black", lw=4, label="Gauss_sum", linestyle="dashed")

# Annotate diagram
ax.set_ylabel("Probability density")
ax.set_xlabel("Arbitrary units")

# Make legend
plt.legend()

plt.show()

enter image description here

112

answered Oct 09 '22 07:10

HansSnah

Related questions
                            
                                from django.db import models, migrations ImportError: cannot import name migrations
                            
                                Is it a bug to omit an Accept */* header in an HTTP/1.0 Request for a REST API
                            
                                How do I get the index of a specific percentile in numpy / scipy?
                            
                                Difference between list(numpy_array) and numpy_array.tolist()
                            
                                Pandas to_csv with quoting=3 (QUOTE_NONNUMERIC) doesn't work
                            
                                Arff Loader : AttributeError: 'dict' object has no attribute 'data'
                            
                                Find the indices at which any element of one list occurs in another
                            
                                How to write EOF to STDIN popen in python
                            
                                Thread that I can pause and resume?
                            
                                Efficiently grab gradients from TensorFlow?
                            
                                Python 3: super() raises TypeError unexpectedly
                            
                                Django Rest Framework: Serialize data from nested json fields to plain object
                            
                                Is __del__ really a destructor?
                            
                                In python argparse, is there a use case for nargs=1?
                            
                                TensorFlow: argmax (-min)
                            
                                How to save a canvas as PNG in Selenium?
                            
                                Installing guppy with pip3 issues
                            
                                How to use xmltodict to get items out of an xml file
                            
                                replacing empty strings with NaN in Pandas
                            
                                Prevent pandas from reading "NA" as NaN

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Understanding Gaussian Mixture Models

Tags:

python

scikit-learn

HansSnah

People also ask

1 Answers

HansSnah

Recent Activity

Donate For Us