I have a square matrix that represents the frequency counts of co-occurrences in a data set. In other words, the rows represent all possible observations of feature 1, and the columns are the possible observations of feature 2. The number in cell (x, y) is the number of times feature 1 was observed to be x at the same time feature 2 was y.
I want to calculate the mutual information contained in this matrix. MATLAB has a built-in information
function, but it takes 2 arguments, one for x and one for y. How would I manipulate this matrix to get the arguments it expects?
Alternatively, I wrote my own mutual information function that takes a matrix, but I'm unsure about its accuracy. Does it look right?
function [mutualinfo] = mutualInformation(counts)
total = sum(counts(:));
pX = sum(counts, 1) ./ total;
pY = sum(counts) ./ total;
pXY = counts ./ total;
[h, w] = size(counts);
mutualinfo = 0;
for row = 1:h
for col = 1:w
mutualinfo = mutualinfo + pXY(row, col) * log(pXY(row, col) / (pX(row)*pY(col)));
end;
end;
end
mi_discrete_cont(x, y, k) computes the mutual information for continuous variable x and discrete (integer values only) y , with k nearest neighbours. mi_cont_cont(x1, x2, k) computes the mutual information for continuous variables x1 and x2 , with k nearest neighbours.
The mutual information can also be calculated as the KL divergence between the joint probability distribution and the product of the marginal probabilities for each variable. — Page 57, Pattern Recognition and Machine Learning, 2006. This can be stated formally as follows: I(X ; Y) = KL(p(X, Y) || p(X) * p(Y))
The Mattes mutual information algorithm uses a single set of pixel locations for the duration of the optimization, instead of drawing a new set at each iteration. The number of samples used to compute the probability density estimates and the number of bins used to compute the entropy are both user selectable.
Usage: I=mi(A,B), where A and B are equally sized images/signals. Function hist2 (included) is used to determine the joint histogram of the images/signals. Assumptions: 1) 0*log(0)=0, 2) mutual information is obtained on the intersection between the supports of partial histograms.
I don't know of any built-in mutual information functions in MATLAB. Perhaps you got a hold of one of the submissions from the MathWorks File Exchange or some other third-party developer code?
I think there may be something wrong with how you are computing pX
and pY
. Plus, you can vectorize your operations instead of using for loops. Here's another version of your function to try out:
function mutualInfo = mutualInformation(counts)
pXY = counts./sum(counts(:));
pX = sum(pXY,2);
pY = sum(pXY,1);
mutualInfo = pXY.*log(pXY./(pX*pY));
mutualInfo = sum(mutualInfo(:));
end
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With