how to calculate entropy from np histogram

Question

I have an example of a histogram with:

mu1 = 10, sigma1 = 10
s1 = np.random.normal(mu1, sigma1, 100000)

and calculated

hist1 = np.histogram(s1, bins=50, range=(-10,10), density=True)
for i in hist1[0]:
    ent = -sum(i * log(abs(i)))
print (ent)

Now I want to find the entropy from the given histogram array, but since np.histogram returns two arrays, I'm having trouble calculating the entropy. How can I just call on first array of np.histogram and calculate entropy? I would also get math domain error for the entropy even if my code above is correct. :(

**Edit: How do I find entropy when Mu = 0? and log(0) yields math domain error?

So the actual code I'm trying to write is:

mu1, sigma1 = 0, 1
mu2, sigma2 = 10, 1
s1 = np.random.normal(mu1, sigma1, 100000)
s2 = np.random.normal(mu2, sigma2, 100000)

hist1 = np.histogram(s1, bins=100, range=(-20,20), density=True)
data1 = hist1[0]
ent1 = -(data1*np.log(np.abs(data1))).sum() 

hist2 = np.histogram(s2, bins=100, range=(-20,20), density=True)
data2 = hist2[0]
ent2 = -(data2*np.log(np.abs(data2))).sum()

So far, the first example ent1 would yield nan, and the second, ent2, yields math domain error :(

Mahdi · Accepted Answer

You can calculate the entropy using vectorized code:

import numpy as np

mu1 = 10
sigma1 = 10

s1 = np.random.normal(mu1, sigma1, 100000)
hist1 = np.histogram(s1, bins=50, range=(-10,10), density=True)
data = hist1[0]
ent = -(data*np.log(np.abs(data))).sum()
# output: 7.1802159512213191

But if you like to use a for loop, you may write:

import numpy as np
import math

mu1 = 10
sigma1 = 10

s1 = np.random.normal(mu1, sigma1, 100000)
hist1 = np.histogram(s1, bins=50, range=(-10,10), density=True)
ent = 0
for i in hist1[0]:
    ent -= i * math.log(abs(i))
print (ent)
# output: 7.1802159512213191

fr_andres · Answer

So for the ultimate copypaste experience, I just merged both existing answers (thank you all) into a more comprehensive numpy-native approach. Hope it helps!

def entropy(hist, bit_instead_of_nat=False):
    """
    given a list of positive values as a histogram drawn from any information source,
    returns the entropy of its probability mass function. Usage example:
      hist = [513, 487] # we tossed a coin 1000 times and this is our histogram
      print entropy(hist, True)  # The result is approximately 1 bit
      hist = [-1, 10, 10]; hist = [0] # this kind of things will trigger the warning
    """
    h = np.asarray(hist, dtype=np.float64)
    if h.sum()<=0 or (h<0).any():
        print "[entropy] WARNING, malformed/empty input %s. Returning None."%str(hist)
        return None
    h = h/h.sum()
    log_fn = np.ma.log2 if bit_instead_of_nat else np.ma.log
    return -(h*log_fn(h)).sum()

Note: Probability density function and probability mass function behave differently on discrete histograms, depending on the size of the bins. See the np.histogram docstring:

density : bool, optional

If False, the result will contain the number of samples in each bin. If True, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1. Note that the sum of the histogram values will not be equal to 1 unless bins of unity width are chosen; it is not a probability mass function.

Overrides the normed keyword if given.

how to calculate entropy from np histogram

Tags:

python

numpy

histogram

entropy

Vinci

2 Answers

Mahdi

fr_andres

Recent Activity

Donate For Us

how to calculate entropy from np histogram

Tags:

python

numpy

histogram

entropy

Vinci

2 Answers

Mahdi

fr_andres

Related questions

Recent Activity

Donate For Us