Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast inner product of ternary vectors

Consider two vectors, A and B, of size n, 7 <= n <= 23. Both A and B consists of -1s, 0s and 1s only.

I need a fast algorithm which computes the inner product of A and B.

So far I've thought of storing the signs and values in separate uint32_ts using the following encoding:

  • sign 0, value 0 → 0
  • sign 0, value 1 → 1
  • sign 1, value 1 → -1.

The C++ implementation I've thought of looks like the following:

struct ternary_vector {
    uint32_t sign, value;
};

int inner_product(const ternary_vector & a, const ternary_vector & b) {
    uint32_t psign = a.sign ^ b.sign;
    uint32_t pvalue = a.value & b.value;
    psign &= pvalue;
    pvalue ^= psign;
    return __builtin_popcount(pvalue) - __builtin_popcount(psign);
}

This works reasonably well, but I'm not sure whether it is possible to do it better. Any comment on the matter is highly appreciated.

like image 419
user92382 Avatar asked Nov 01 '13 18:11

user92382


1 Answers

I like having the 2 uint32_t, but I think your actual calculation is a bit wasteful

Just a few minor points:

  • I'm not sure about the reference (getting a and b by const &) - this adds a level of indirection compared to putting them on the stack. When the code is this small (a couple of clocks maybe) this is significant. Try passing by value and see what you get

  • __builtin_popcount can be, unfortunately, very inefficient. I've used it myself, but found that even a very basic implementation I wrote was far faster than this. However - this is dependent on the platform.

Basically, if the platform has a hardware popcount implementation, __builtin_popcount uses it. If not - it uses a very inefficient replacement.

like image 139
rabensky Avatar answered Nov 08 '22 12:11

rabensky