Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to calculate entropy from np histogram

I have an example of a histogram with:

mu1 = 10, sigma1 = 10
s1 = np.random.normal(mu1, sigma1, 100000)

and calculated

hist1 = np.histogram(s1, bins=50, range=(-10,10), density=True)
for i in hist1[0]:
    ent = -sum(i * log(abs(i)))
print (ent)

Now I want to find the entropy from the given histogram array, but since np.histogram returns two arrays, I'm having trouble calculating the entropy. How can I just call on first array of np.histogram and calculate entropy? I would also get math domain error for the entropy even if my code above is correct. :(

**Edit: How do I find entropy when Mu = 0? and log(0) yields math domain error?


So the actual code I'm trying to write is:

mu1, sigma1 = 0, 1
mu2, sigma2 = 10, 1
s1 = np.random.normal(mu1, sigma1, 100000)
s2 = np.random.normal(mu2, sigma2, 100000)

hist1 = np.histogram(s1, bins=100, range=(-20,20), density=True)
data1 = hist1[0]
ent1 = -(data1*np.log(np.abs(data1))).sum() 

hist2 = np.histogram(s2, bins=100, range=(-20,20), density=True)
data2 = hist2[0]
ent2 = -(data2*np.log(np.abs(data2))).sum() 

So far, the first example ent1 would yield nan, and the second, ent2, yields math domain error :(

like image 724
Vinci Avatar asked Dec 25 '22 01:12

Vinci


2 Answers

You can calculate the entropy using vectorized code:

import numpy as np

mu1 = 10
sigma1 = 10

s1 = np.random.normal(mu1, sigma1, 100000)
hist1 = np.histogram(s1, bins=50, range=(-10,10), density=True)
data = hist1[0]
ent = -(data*np.log(np.abs(data))).sum()
# output: 7.1802159512213191

But if you like to use a for loop, you may write:

import numpy as np
import math

mu1 = 10
sigma1 = 10

s1 = np.random.normal(mu1, sigma1, 100000)
hist1 = np.histogram(s1, bins=50, range=(-10,10), density=True)
ent = 0
for i in hist1[0]:
    ent -= i * math.log(abs(i))
print (ent)
# output: 7.1802159512213191
like image 121
Mahdi Avatar answered Dec 26 '22 14:12

Mahdi


So for the ultimate copypaste experience, I just merged both existing answers (thank you all) into a more comprehensive numpy-native approach. Hope it helps!

def entropy(hist, bit_instead_of_nat=False):
    """
    given a list of positive values as a histogram drawn from any information source,
    returns the entropy of its probability mass function. Usage example:
      hist = [513, 487] # we tossed a coin 1000 times and this is our histogram
      print entropy(hist, True)  # The result is approximately 1 bit
      hist = [-1, 10, 10]; hist = [0] # this kind of things will trigger the warning
    """
    h = np.asarray(hist, dtype=np.float64)
    if h.sum()<=0 or (h<0).any():
        print "[entropy] WARNING, malformed/empty input %s. Returning None."%str(hist)
        return None
    h = h/h.sum()
    log_fn = np.ma.log2 if bit_instead_of_nat else np.ma.log
    return -(h*log_fn(h)).sum()

Note: Probability density function and probability mass function behave differently on discrete histograms, depending on the size of the bins. See the np.histogram docstring:

density : bool, optional

If False, the result will contain the number of samples in each bin. If True, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1. Note that the sum of the histogram values will not be equal to 1 unless bins of unity width are chosen; it is not a probability mass function.

Overrides the normed keyword if given.

like image 36
fr_andres Avatar answered Dec 26 '22 14:12

fr_andres