Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Faster way to take xor between one row and a matrix of 1's and 0's?

Tags:

arrays

matlab

I have a row of data, say A = [0 1 1 1 0 0].

Matrix B contains many rows. For a dummy example let's say it's just B = [1 1 1 0 1 0; 1 0 0 1 0 1].

I want to find the number of columns in which A and a row of B differ, and use that vector of differences to find which row of B is most similar to A. So for the example above, A differs from B(1,:) in columns 1, 4, 5 = 3 total difference. A differs from B(2,:) in columns 1, 2, 3, 6 = 4 total differences, and so I would want to return index 1 to indicate that A is most similar to B(1,:).

In reality B has ~50,000 rows, and A and B both have about 800 columns. My current code to find the most similar row is below:

min(sum(xor(repmat(A,B_rows,1),B),2));

That works, but it's very slow. Any insights into which function is taking so long and ways to improve it?

like image 960
user1956609 Avatar asked Dec 26 '22 15:12

user1956609


2 Answers

There are 6 more or less similar approaches with bsxfun, hard to tell which one is the fastest.

%example data
A = [0 1 1 1 0 0];
B = perms(A);     %720x6 matrix

Counting similarities:

Z = sum(  bsxfun(@eq,  A,B) , 2);
Z = sum( ~bsxfun(@ne,  A,B) , 2);
Z = sum( ~bsxfun(@xor, A,B) , 2);

Counting differences:

Z = sum( ~bsxfun(@eq,  A,B) , 2);
Z = sum(  bsxfun(@ne,  A,B) , 2);
Z = sum(  bsxfun(@xor, A,B) , 2);

Z is a vector containing the number of equal/unequal elements of A for every row of B.

Benchmark for 100 trials each (same order as above):

t100 =

    0.0081
    0.0098
    0.0123
    0.0102
    0.0087
    0.0111
like image 112
Robert Seifert Avatar answered Dec 28 '22 05:12

Robert Seifert


Use pdist2 with 'hamming' distance

[D I] = pdist2( A, B, 'hamming', 'Smallest', 1 );
like image 37
Shai Avatar answered Dec 28 '22 05:12

Shai