Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tallying co-incidences of numbers in columns of a matrix - MATLAB

Tags:

matlab

I have a matrix (A) in the form of (much larger in reality):

205   204   201
202   208   202

How can I tally the co-incidence of numbers on a column-by-column basis and then output this to a matrix?

I'd want the final matrix to run from min(A):max(A) (or be able to specify a specific range) across the top and down the side and for it to tally co-incidences of numbers in each column. Using the above example:

    200 201 202 203 204 205 206 207 208
200  0   0   0   0   0   0   0   0   0
201  0   0   1   0   0   0   0   0   0
202  0   0   0   0   0   1   0   0   0 
203  0   0   0   0   0   0   0   0   0
204  0   0   0   0   0   0   0   0   1
205  0   0   0   0   0   0   0   0   0
206  0   0   0   0   0   0   0   0   0
207  0   0   0   0   0   0   0   0   0
208  0   0   0   0   0   0   0   0   0

(Matrix labels are not required)

Two important points: The tallying needs to be non-duplicating and occur in numerical order. For example a column containing:

205
202

Will tally this as a 202 occurring with 205 (as shown in the above matrix) but NOT 205 with 202 - the duplicate reciprocal. When deciding what number to use as the reference, it should be the smallest.

EDIT:

enter image description here

like image 927
AnnaSchumann Avatar asked Oct 31 '14 10:10

AnnaSchumann


3 Answers

sparse to the rescue!

Let your data and desired range be defined as

A = [ 205   204   201
      202   208   202 ]; %// data. Two-row matrix
limits = [200 208]; %// desired range. It needn't include all values of A

Then

lim1 = limits(1)-1;
s = limits(2)-lim1;
cols = all((A>=limits(1)) & (A<=limits(2)), 1);
B = sort(A(:,cols), 1, 'descend')-lim1;
R = full(sparse(B(2,:), B(1,:), 1, s, s));

gives

R =
     0     0     0     0     0     0     0     0     0
     0     0     1     0     0     0     0     0     0
     0     0     0     0     0     1     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     1
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0

Alternatively, you can dispense with sort and use matrix addition followed by triu to obtain the same result (possibly faster):

lim1 = limits(1)-1;
s = limits(2)-lim1;
cols = all( (A>=limits(1)) & (A<=limits(2)) , 1);
R = full(sparse(A(2,cols)-lim1, A(1,cols)-lim1, 1, s, s));
R = triu(R + R.');

Both approaches handle repeated columns (up to sorting), correctly increasing their tally. For example,

A = [205   204   201
     201   208   205]

gives

R =
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     2     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     1
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
like image 161
Luis Mendo Avatar answered Sep 28 '22 12:09

Luis Mendo


See if this is what you were after -

range1 = 200:208 %// Set the range

A = A(:,all(A>=min(range1)) & all(A<=max(range1))) %// select A with columns
                                                   %// that fall within range1
A_off = A-range1(1)+1 %// Get the offsetted indices from A

A_off_sort = sort(A_off,1) %// sort offset indices to satisfy "smallest" criteria

out = zeros(numel(range1)); %// storage for output matrix
idx = sub2ind(size(out),A_off_sort(1,:),A_off_sort(2,:)) %// get the indices to be set

unqidx = unique(idx)
out(unqidx) = histc(idx,unqidx) %// set coincidences

With

A = [205   204   201
     201   208   205]

this gets -

out =
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     2     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     1
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0

Few performance-oriented tricks could be used here -

I. Replace

out = zeros(numel(range1)); 

with

out(numel(range1),numel(range1)) = 0;

II. Replace

idx = sub2ind(size(out),A_off_sort(1,:),A_off_sort(2,:))  

with

idx = (A_off_sort(2,:)-1)*numel(range1)+A_off_sort(1,:)
like image 39
Divakar Avatar answered Sep 28 '22 11:09

Divakar


What about a solution using accumarray? I would first sort each column independently, then use the first row as first dimension into the final accumulation matrix, then the second row as the second dimension into the final accumulation matrix. Something like:

limits = 200:208;
A = A(:,all(A>=min(limits)) & all(A<=max(limits))); %// Borrowed from Divakar

%// Sort the columns individually and bring down to 1-indexing
B = sort(A, 1) - limits(1) + 1;

%// Create co-occurrence matrix
C = accumarray(B.', 1, [numel(limits) numel(limits)]);

With:

A = [205   204   201
     202   208   202]

This is the output:

C =

     0     0     0     0     0     0     0     0     0
     0     0     1     0     0     0     0     0     0
     0     0     0     0     0     1     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     1
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0

With duplicates (borrowed from Luis Mendo):

A = [205   204   201
     201   208   205]

Output:

C =

     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     2     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     1
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0
like image 31
rayryeng Avatar answered Sep 28 '22 11:09

rayryeng