Jensen-Shannon Divergence

Tags:

I have another question that I was hoping someone could help me with.

I'm using the Jensen-Shannon-Divergence to measure the similarity between two probability distributions. The similarity scores appear to be correct in the sense that they fall between 1 and 0 given that one uses the base 2 logarithm, with 0 meaning that the distributions are equal.

However, I'm not sure whether there is in fact an error somewhere and was wondering whether someone might be able to say 'yes it's correct' or 'no, you did something wrong'.

Here is the code:

from numpy import zeros, array
from math import sqrt, log


class JSD(object):
    def __init__(self):
        self.log2 = log(2)


    def KL_divergence(self, p, q):
        """ Compute KL divergence of two vectors, K(p || q)."""
        return sum(p[x] * log((p[x]) / (q[x])) for x in range(len(p)) if p[x] != 0.0 or p[x] != 0)

    def Jensen_Shannon_divergence(self, p, q):
        """ Returns the Jensen-Shannon divergence. """
        self.JSD = 0.0
        weight = 0.5
        average = zeros(len(p)) #Average
        for x in range(len(p)):
            average[x] = weight * p[x] + (1 - weight) * q[x]
            self.JSD = (weight * self.KL_divergence(array(p), average)) + ((1 - weight) * self.KL_divergence(array(q), average))
        return 1-(self.JSD/sqrt(2 * self.log2))

if __name__ == '__main__':
    J = JSD()
    p = [1.0/10, 9.0/10, 0]
    q = [0, 1.0/10, 9.0/10]
    print J.Jensen_Shannon_divergence(p, q)

The problem is that I feel that the scores are not high enough when comparing two text documents, for instance. However, this is purely a subjective feeling.

Any help is, as always, appreciated.

561

asked Apr 08 '13 13:04

Martyn

3 Answers

Note that the scipy entropy call below is the Kullback-Leibler divergence.

See: http://en.wikipedia.org/wiki/Jensen%E2%80%93Shannon_divergence

#!/usr/bin/env python
from scipy.stats import entropy
from numpy.linalg import norm
import numpy as np

def JSD(P, Q):
    _P = P / norm(P, ord=1)
    _Q = Q / norm(Q, ord=1)
    _M = 0.5 * (_P + _Q)
    return 0.5 * (entropy(_P, _M) + entropy(_Q, _M))

Also note that the test case in the Question looks erred?? The sum of the p distribution does not add to 1.0.

See: http://www.itl.nist.gov/div898/handbook/eda/section3/eda361.htm

answered Oct 29 '22 10:10

Doug Shore

Since the Jensen-Shannon distance (distance.jensenshannon) has been included in Scipy 1.2, the Jensen-Shannon divergence can be obtained as the square of the Jensen-Shannon distance:

from scipy.spatial import distance

distance.jensenshannon([1.0/10, 9.0/10, 0], [0, 1.0/10, 9.0/10]) ** 2
# 0.5306056938642212

answered Oct 29 '22 10:10

Xavier Guihot

Get some data for distributions with known divergence and compare your results against those known values.

BTW: the sum in KL_divergence may be rewritten using the zip built-in function like this:

sum(_p * log(_p / _q) for _p, _q in zip(p, q) if _p != 0)

This does away with lots of "noise" and is also much more "pythonic". The double comparison with 0.0 and 0 is not necessary.

answered Oct 29 '22 08:10

Ber

Related questions
                            
                                Type error trying to load fixtures with Content_type natural keys in Django
                            
                                Python 3 hosting [closed]
                            
                                OpenOffice pyuno "select all"
                            
                                Low Apache/mod_wsgi throughput
                            
                                App Engine: Calculating the dimensions of thumbnails to be generated by serving thumbnails from the blobstore
                            
                                Forking Greenlets
                            
                                How to express conditional execution inside Python lambdas?
                            
                                How do I convert this csv data into a bar chart?
                            
                                Python urllib3: close idle connection after some time
                            
                                How to count number of white and black pixels in color picture in python? How to count total pixels using numpy
                            
                                python running coverage on never ending process
                            
                                cx_Freeze - Preventing including unneeded packages
                            
                                Multiple Database Config in Django 1.2
                            
                                python sklearn multiple linear regression display r-squared
                            
                                Is it pythonic for a function to return chained values / set variables on itself?
                            
                                Call python code from c via cython
                            
                                Python 3.4 and 2.7 installation no Script folder and no pip installed
                            
                                Check if file is a named pipe (fifo) in python?
                            
                                Cross product of two vectors in Python
                            
                                'Tensor' object has no attribute 'lower'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Jensen-Shannon Divergence

Tags:

python

distance

metrics

Martyn

People also ask

3 Answers

Doug Shore

Xavier Guihot

Ber

Recent Activity

Donate For Us