I have some statistics over some properties like:
1st iter : p1:10 p2:0 p3:12 p4:33 p5:0.17 p6:ok p8:133 p9:89
2nd iter : p1:43 p2:1 p6:ok p8:12 p9:33
3rd iter : p1:14 p2:0 p3:33 p5:0.13 p9:2
...
(p1 -> number of tries, p2 -> try done well, p3..pN -> properties of try).
I need to calculate the amount of information of each property. After some procedures of quantization (for ex. to 10 levels) to make all input numbers on the same level the input file starts to look like:
p0: 4 3 2 4 5 5 6 7
p3: 4 5 3 3
p4: 5 3 3 2 1 2 3
...
Where p(0) = funct(p1,p2)
.
Not every input line got every pK
so len(pk) <= len(p0)
.
Now I know how to calculate entropy of each property via Shannon entropy for each line. I need to calculate mutual information from here.
Calculation of joint entropy for mutual information I(p0,pK)
is stuck because of different lengths.
I'm calculating entropy for one element like this:
def entropy(x):
probs = [np.mean(x == c) for c in set(x)]
return np.sum(-p * np.log2(p) for p in probs)
So, for joint I need to use product
to generate input array x
and use zip(p0,pk)
instead of set(x)
?
I'm assuming that you want to calculate mutual information between each p1
and each of p2
, p3
,... subsequently.
1) Calculate H(X)
as entropy from p1 with:
each x
being subsequent element from p1
.
2) Calculate H(Y)
as entropy from pK
with the same equation, with each x
being subsequent element from p1
3) Create a new pair collection out of p1
and pK
:
pairs = zip(p1, pK)
Note that if the values in columns of your data have different meaning then you should probably fill the missing data (for example using 0
s or values from previous iteration).
4) Calculate joint entropy H(X,Y)
using:
Note that you can't just use the first equation and treat each pair as a single element - you must iterate through the whole Cartesian product between p1
and pK
in this equation, calculating probabilities using pairs
collection. So, for iterating over the whole Cartesian product use for xy in itertools.product(p1, pK): ...
.
5) Then you can have the mutual information between p1
and pK
as:
Using numpy capabilities you can calculate joint entropy as presented here:
def entropy(X, Y):
probs = []
for c1 in set(X):
for c2 in set(Y):
probs.append(np.mean(np.logical_and(X == c1, Y == c2)))
return np.sum(-p * np.log2(p) for p in probs if p > 0)
where if p > 0
is consistent with entropy's definition:
In the case of p(xi) = 0 for some i, the value of the corresponding summand 0 logb(0) is taken to be 0
If you don't want to use numpy
, then a version without it might look something like:
def entropyPart(p):
if not p:
return 0
return -p * math.log(p)
def entropy(X, Y):
pairs = zip(X, Y)
probs = []
for pair in itertools.product(X,Y):
probs.append(1.0 * sum([p == pair for p in pairs]) / len(pairs))
return sum([entropyPart(p) for p in probs])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With