Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast dot product for a very special case

Given a vector X of size L, where every scalar element of X is from a binary set {0,1}, it is to find a dot product z=dot(X,Y) if vector Y of size L consists of the integer-valued elements. I suggest, there must exist a very fast way to do it.

Let's say we have L=4; X[L]={1, 0, 0, 1}; Y[L]={-4, 2, 1, 0} and we have to find z=X[0]*Y[0] + X[1]*Y[1] + X[2]*Y[2] + X[3]*Y[3] (which in this case will give us -4).

It is obvious that X can be represented using binary digits, e.g. an integer type int32 for L=32. Then, all what we have to do is to find a dot product of this integer with an array of 32 integers. Do you have any idea or suggestions how to do it very fast?

like image 271
psihodelia Avatar asked May 13 '10 10:05

psihodelia


2 Answers

This really would require profiling but an alternative you might want to consider:

int result=0;
int mask=1;
for ( int i = 0; i < L; i++ ){
    if ( X & mask ){
        result+=Y[i];
    }
    mask <<= 1;
}

Typically bit shifting and bitwise operations are faster than multiplication, however, the if statement might be slower than a multiplication, although with branch prediction and large L my guess is it might be faster. You would really have to profile it, though, to determine if it resulted in any speedup.

As has been pointed out in the comments below, unrolling the loop either manually or via a compiler flag (such as "-funroll-loops" on GCC) could also speed this up (eliding the loop condition).

Edit
In the comments below, the following good tweak has been proposed:

int result=0;
for ( int i = 0; i < L; i++ ){
    if ( X & 1 ){
        result+=Y[i];
    }
    X >>= 1;
}
like image 133
Michael Aaron Safyan Avatar answered Oct 13 '22 00:10

Michael Aaron Safyan


Is a suggestion to look into SSE2 helpful? It has dot-product type operations already, plus you can trivially do 4 (or perhaps 8, I forget the register size) simple iterations of your naive loop in parallel. SSE also has some simple logic-type operations so it may be able to do additions rather than multiplications without using any conditional operations... again you'd have to look at what ops are available.

like image 25
Mr. Boy Avatar answered Oct 12 '22 23:10

Mr. Boy