I have an example of a histogram with:
mu1 = 10, sigma1 = 10
s1 = np.random.normal(mu1, sigma1, 100000)
and calculated
hist1 = np.histogram(s1, bins=50, range=(-10,10), density=True)
for i in hist1[0]:
ent = -sum(i * log(abs(i)))
print (ent)
Now I want to find the entropy from the given histogram array, but since np.histogram returns two arrays, I'm having trouble calculating the entropy. How can I just call on first array of np.histogram and calculate entropy? I would also get math domain error for the entropy even if my code above is correct. :(
**Edit: How do I find entropy when Mu = 0? and log(0) yields math domain error?
So the actual code I'm trying to write is:
mu1, sigma1 = 0, 1
mu2, sigma2 = 10, 1
s1 = np.random.normal(mu1, sigma1, 100000)
s2 = np.random.normal(mu2, sigma2, 100000)
hist1 = np.histogram(s1, bins=100, range=(-20,20), density=True)
data1 = hist1[0]
ent1 = -(data1*np.log(np.abs(data1))).sum()
hist2 = np.histogram(s2, bins=100, range=(-20,20), density=True)
data2 = hist2[0]
ent2 = -(data2*np.log(np.abs(data2))).sum()
So far, the first example ent1 would yield nan, and the second, ent2, yields math domain error :(
You can calculate the entropy using vectorized code:
import numpy as np
mu1 = 10
sigma1 = 10
s1 = np.random.normal(mu1, sigma1, 100000)
hist1 = np.histogram(s1, bins=50, range=(-10,10), density=True)
data = hist1[0]
ent = -(data*np.log(np.abs(data))).sum()
# output: 7.1802159512213191
But if you like to use a for loop, you may write:
import numpy as np
import math
mu1 = 10
sigma1 = 10
s1 = np.random.normal(mu1, sigma1, 100000)
hist1 = np.histogram(s1, bins=50, range=(-10,10), density=True)
ent = 0
for i in hist1[0]:
ent -= i * math.log(abs(i))
print (ent)
# output: 7.1802159512213191
So for the ultimate copypaste experience, I just merged both existing answers (thank you all) into a more comprehensive numpy-native approach. Hope it helps!
def entropy(hist, bit_instead_of_nat=False):
"""
given a list of positive values as a histogram drawn from any information source,
returns the entropy of its probability mass function. Usage example:
hist = [513, 487] # we tossed a coin 1000 times and this is our histogram
print entropy(hist, True) # The result is approximately 1 bit
hist = [-1, 10, 10]; hist = [0] # this kind of things will trigger the warning
"""
h = np.asarray(hist, dtype=np.float64)
if h.sum()<=0 or (h<0).any():
print "[entropy] WARNING, malformed/empty input %s. Returning None."%str(hist)
return None
h = h/h.sum()
log_fn = np.ma.log2 if bit_instead_of_nat else np.ma.log
return -(h*log_fn(h)).sum()
Note: Probability density function and probability mass function behave differently on discrete histograms, depending on the size of the bins. See the np.histogram docstring:
density : bool, optional
If False, the result will contain the number of samples in each bin. If True, the result is the value of the probability density function at the bin, normalized such that the integral over the range is 1. Note that the sum of the histogram values will not be equal to 1 unless bins of unity width are chosen; it is not a probability mass function.
Overrides the normed keyword if given.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With