Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find groups with high cross correlation matrix in Matlab

Given a lower triangular matrix (100x100) containg cross-correlation values, where entry 'ij' is the correlation value between signal 'i' and 'j' and so a high value means that these two signals belong to the same class of objects, and knowing there are at most four distinct classes in the data set, does someone know of a fast and effective way to classify the data and assign all the signals to the 4 different classes, rather than search and cross check all the entries against each other? The following 7x7 matrix may help illustrate the point:

 1      0       0       0       0       0       0
.2      1       0       0       0       0       0
.8      .15     1       0       0       0       0
.9      .17     .8      1       0       0       0
.23     .8      .15     .14     1       0       0
.7      .13     .77     .83.    .11     1       0
.1      .21     .19     .11     .17     .16     1

there are three classes in this example:

class 1: rows <1 3 4 6>,
class 2: rows <2 5>,
class 3: rows <7>
like image 707
user1641496 Avatar asked Sep 02 '12 07:09

user1641496


2 Answers

This is a good problem for hierarchical clustering. Using complete linkage clustering you will get compact clusters, all you have to do is determine the cutoff distance, at which two clusters should be considered different.

First, you need to convert the correlation matrix to a dissimilarity matrix. Since correlation is between 0 and 1, 1-correlation will work well - high correlations get a score close to 0, and low correlations get a score close to 1. Assume that the correlations are stored in an array corrMat

%# remove diagonal elements
corrMat = corrMat - eye(size(corrMat));
%# and convert to a vector (as pdist)
dissimilarity = 1 - corrMat(find(corrMat))';

%# decide on a cutoff
%# remember that 0.4 corresponds to corr of 0.6!
cutoff = 0.5; 

%# perform complete linkage clustering
Z = linkage(dissimilarity,'complete');

%# group the data into clusters
%# (cutoff is at a correlation of 0.5)
groups = cluster(Z,'cutoff',cutoff,'criterion','distance')
groups =
     2
     3
     2
     2
     3
     2
     1

To confirm that everything is great, you can visualize the dendrogram

dendrogram(Z,0,'colorthreshold',cutoff)

enter image description here

like image 133
Jonas Avatar answered Oct 06 '22 00:10

Jonas


You can use the following method instead of creating the dissimilarity matrix.

Z = linkage(corrMat,'complete','correlation')

This allows Matlab to interpret your matrix as correlation distance and then, you can plot the dendrogram as follows:

dendrogram(Z);

One way to verify if your dendrogram is right or not is by checking its maximum height which should correspond to 1-min(corrMat). If the minimum value in corrMat is 0 then the maximum height of your tree should be 1. If the minimum value is -1 (negative correlation), the height should be 2.

like image 39
Sammy Avatar answered Oct 06 '22 01:10

Sammy