KL-Divergence of two GMMs

Tags:

I have two GMMs that I used to fit two different sets of data in the same space, and I would like to calculate the KL-divergence between them.

Currently I am using the GMMs defined in sklearn (http://scikit-learn.org/stable/modules/generated/sklearn.mixture.GMM.html) and the SciPy implementation of KL-divergence (http://docs.scipy.org/doc/scipy-dev/reference/generated/scipy.stats.entropy.html)

How would I go about doing this? Do I want to just create tons of random points, get their probabilities on each of the two models (call them P and Q) and then use those probabilities as my input? Or is there some more canonical way to do this within the SciPy/SKLearn environment?

385

asked Sep 27 '14 22:09

Andrew Latham

1 Answers

There's no closed form for the KL divergence between GMMs. You can easily do Monte Carlo, though. Recall that KL(p||q) = \int p(x) log(p(x) / q(x)) dx = E_p[ log(p(x) / q(x)). So:

def gmm_kl(gmm_p, gmm_q, n_samples=10**5):
    X = gmm_p.sample(n_samples)
    log_p_X, _ = gmm_p.score_samples(X)
    log_q_X, _ = gmm_q.score_samples(X)
    return log_p_X.mean() - log_q_X.mean()

(mean(log(p(x) / q(x))) = mean(log(p(x)) - log(q(x))) = mean(log(p(x))) - mean(log(q(x))) is somewhat cheaper computationally.)

You don't want to use scipy.stats.entropy; that's for discrete distributions.

If you want the symmetrized and smoothed Jensen-Shannon divergence KL(p||(p+q)/2) + KL(q||(p+q)/2) instead, it's pretty similar:

def gmm_js(gmm_p, gmm_q, n_samples=10**5):
    X = gmm_p.sample(n_samples)
    log_p_X, _ = gmm_p.score_samples(X)
    log_q_X, _ = gmm_q.score_samples(X)
    log_mix_X = np.logaddexp(log_p_X, log_q_X)

    Y = gmm_q.sample(n_samples)
    log_p_Y, _ = gmm_p.score_samples(Y)
    log_q_Y, _ = gmm_q.score_samples(Y)
    log_mix_Y = np.logaddexp(log_p_Y, log_q_Y)

    return (log_p_X.mean() - (log_mix_X.mean() - np.log(2))
            + log_q_Y.mean() - (log_mix_Y.mean() - np.log(2))) / 2

(log_mix_X/log_mix_Y are actually the log of twice the mixture densities; pulling that out of the mean operation saves some flops.)

147

answered Sep 20 '22 21:09

Danica

Related questions
                            
                                How to delete QTreeWidgetItem
                            
                                which python web framework(django or django-norel or pyramid) to use when MongoDB is used as a database
                            
                                python: iterator from a function
                            
                                Merge/join lists of dictionaries based on a common value in Python
                            
                                Sharing scope in Python between called and calling functions
                            
                                Is it possible for a python function to ignore unused kwargs [duplicate]
                            
                                minimum of list of lists
                            
                                How to debug sublime plugins during development
                            
                                Python ternary operator can't return multiple values?
                            
                                How to install NLTK modules in Heroku
                            
                                JSON - Generating a json in a loop in python
                            
                                Python: % operator in print() statement
                            
                                Python peewee joins multiple tables
                            
                                Mongoengine get latest()
                            
                                Django rest framework ignores has_object_permission
                            
                                sphinx not to include source code during build
                            
                                python argparse: arg with no flag
                            
                                reading CSV file and inserting it into 2d list in python
                            
                                Running python script with arguments in microsoft visual studio
                            
                                In matlab, how to read python pickle file?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

KL-Divergence of two GMMs

Tags:

python

numpy

statistics

scipy

scikit-learn

Andrew Latham

People also ask

1 Answers

Danica

Recent Activity

Donate For Us