Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C/C++ optimization: negate doubles fast

I need to negate very large number of doubles quickly. If bit_generator generates 0, then the sign must be changed. If bit_generator generates 1, then nothing happens. The loop is run many times over and bit_generator is extremely fast. On my platform case 2 is noticeably faster than case 1. Looks like my CPU doesn't like branching. Is there any faster and portable way to do it? What do you think about case 3?

// generates 0 and 1
int bit_generator();

// big vector (C++)
vector<double> v;

// case 1
for (size_t i=0; i<v.size(); ++i)
    if (bit_generator()==0)
        v[i] = -v[i];

// case 2
const int sign[] = {-1, 1};
for (size_t i=0; i<v.size(); ++i)
        v[i] *= sign[bit_generator()];

// case 3
const double sign[] = {-1, 1};
for (size_t i=0; i<v.size(); ++i)
        v[i] *= sign[bit_generator()];

// case 4 uses C-array
double a[N];
double number_generator(); // generates doubles
double z[2]; // used as buffer
for (size_t i=0; i<N; ++i) {
        z[0] = number_generator();
        z[1] = -z[0];
        a[i] = z[bit_generator()];
}

EDIT: Added case 4 and C-tag, because the vector can be a plain array. Since I can control how doubles are generated, I redesigned the code as shown in case 4. It avoids extra multiplication and branching at the same. I presume it should be quite fast on all platforms.

like image 668
user401947 Avatar asked Dec 01 '22 05:12

user401947


1 Answers

Unless you want to resize the vector in the loop, lift the v.size() out of the for expression, i.e.

const unsigned SZ=v.size();
for (size_t i=0; i<SZ; ++i)
    if (bit_generator()==0)
        v[i] = -v[i];

If the compiler can't see what happens in bit_generator(), then it might be very hard for the compiler to prove that v.size() does not change, which makes loop unrolling or vectorization impossible.

UPDATE: I've made some tests and on my machine method 2 seems to be fastest. However, it seems to be faster to use a pattern which I call "group action" :-). Basically, you group multiple decisions into one value and switch over it:

const size_t SZ=v.size();
for (size_t i=0; i<SZ; i+=2) // manual loop unrolling
{
 int val=2*bit_generator()+bit_generator();
 switch(val) // only one conditional
 {
  case 0: 
     break; // nothing happes
  case 1: 
     v[i+1]=-v[i+1]; 
     break; 
  case 2: 
     v[i]=-v[i]; 
     break; 
  case 3: 
    v[i]=-v[i];
    v[i+1]=-v[i+1]; 
 }
}
// not shown: wrap up the loop if SZ%2==1 
like image 164
Nordic Mainframe Avatar answered Dec 04 '22 07:12

Nordic Mainframe