Interpreting scipy.stats.entropy values

Tags:

I am trying to use scipy.stats.entropy to estimate the Kullback–Leibler (KL) divergence between two distributions. More specifically, I would like to use the KL as a metric to decide how consistent two distributions are.

However, I cannot interpret the KL values. For ex:

t1=numpy.random.normal(-2.5,0.1,1000)

t2=numpy.random.normal(-2.5,0.1,1000)

scipy.stats.entropy(t1,t2)

0.0015539217193737955

Then,

t1=numpy.random.normal(-2.5,0.1,1000)

t2=numpy.random.normal(2.5,0.1,1000)

scipy.stats.entropy(t1,t2)

= 0.0015908295787942181

How can completely different distributions with essentially no overlap have the same KL value?

t1=numpy.random.normal(-2.5,0.1,1000)

t2=numpy.random.normal(25.,0.1,1000)

scipy.stats.entropy(t1,t2)

= 0.00081111364805590595

This one gives even a smaller KL value (i.e. distance), which I would be inclined to interpret as "more consistent".

Any insights on how to interpret the scipy.stats.entropy (i.e., KL divergence distance) in this context?

619

asked Nov 04 '14 19:11

Scientist

1 Answers

numpy.random.normal(-2.5,0.1,1000) is a sample from a normal distribution. It's just 1000 numbers in a random order. The documentation for entropy says:

pk[i] is the (possibly unnormalized) probability of event i.

So to get a meaninful result, you need the numbers to be "aligned" so that the same indices correspond to the same positions in the distribution. In your example t1[0] has no relationship to t2[0]. Your sample doesn't provide any direct information about how probable each value is, which is what you need for the KL divergence; it just gives you some actual values that were taken from the distribution.

The most straightforward way to get aligned values is to evaluate the distribution's probability density function at some fixed set of values. To do this, you need to use scipy.stats.norm (which results a distribution object that can be manipulated in various ways) instead of np.random.normal (which only returns sampled values). Here's an example:

t1 = stats.norm(-2.5, 0.1)
t2 = stats.norm(-2.5, 0.1)
t3 = stats.norm(-2.4, 0.1)
t4 = stats.norm(-2.3, 0.1)

# domain to evaluate PDF on
x = np.linspace(-5, 5, 100)

Then:

>>> stats.entropy(t1.pdf(x), t2.pdf(x))
-0.0
>>> stats.entropy(t1.pdf(x), t3.pdf(x))
0.49999995020647586
>>> stats.entropy(t1.pdf(x), t4.pdf(x))
1.999999900414918

You can see that as the distributions move further apart, their KL divergence increases. (In fact, using your second example will give a KL divergence of inf because they overlap so little.)

answered Sep 28 '22 05:09

BrenBarn

Related questions
                            
                                mpi4py Send/Recv with tag
                            
                                Is sqlite3 fetchall necessary?
                            
                                How to get the output from .jar execution in python codes?
                            
                                How to rotate a simple matplotlib Axes
                            
                                matplotlib: render into buffer / access pixel data
                            
                                python np.round() with decimal option larger than 2
                            
                                crawl site that has infinite scrolling using python
                            
                                unstack multiindex dataframe to flat data frame in pandas
                            
                                Can I generate authentic random number with python?
                            
                                Put an Image on a QPushButton
                            
                                Unable to append a translated string to himself with gettext
                            
                                How to remove Item from QListWidget
                            
                                Update YAML file programmatically
                            
                                Pass Dynamic Javascript Variable to Django/Python
                            
                                Solve Generalized Eigenvalue Problem in Numpy
                            
                                Convert PILLOW image into StringIO
                            
                                generate image -> embed in flask with a data uri
                            
                                python time measure for every function [duplicate]
                            
                                Python matplotlib.stem plot with no markers
                            
                                How can I get the matplotlib rgb color, given the colormap name, BoundryNorm, and 'c='?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Interpreting scipy.stats.entropy values

Tags:

python

statistics

scipy

entropy

Scientist

People also ask

1 Answers

BrenBarn

Recent Activity

Donate For Us