Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CUDA: Why are bitwise operators sometimes faster than logical operators?

When I am down to squeezing the last bit of performance out of a kernel, I usually find that replacing the logical operators (&& and ||) with bitwise operators (& and |) makes the kernel a little bit faster. This was observed by looking at the kernel time summary in CUDA Visual Profiler.

So, why are bitwise operators faster than logical operators in CUDA? I must admit that they are not always faster, but a lot of times they are. I wonder what magic can give this speedup.

Disclaimer: I am aware that logical operators short-circuit and bitwise operators do not. I am well aware of how these operators can be misused resulting in wrong code. I use this replacement with care only when the resulting logic remains the same, there is a speedup and the speedup thus obtained matters to me :-)

like image 226
Ashwin Nanjappa Avatar asked Mar 28 '12 11:03

Ashwin Nanjappa


People also ask

Are bitwise operators faster than logical operators?

No. First, using bitwise operators in contrast to logical operators is prone to error (e.g., doing a right-shift by 1 is NOT equivalent to multiplying by two). Second, the performance benefit is negligible (if any). Last but not least, using logical operators conveys meaning much better.

Why are bitwise operations faster?

Yes, Bitwise operations are alot faster than any arithmetic operations because these operations are performed directly on the bits that is 0 and 1. In this operation we will get the output Odd.

What are the advantages of using bitwise operations?

Bitwise operations are incredibly simple and thus usually faster than arithmetic operations. For example to get the green portion of an rgb value, the arithmetic approach is (rgb / 256) % 256 . With bitwise operations you would do something as (rgb >> 8) & 0xFF .

Is Bitwise faster than modulus?

In the multiplication case, the normal version actually performs about 20% faster than the bitwise equivalent. On the other hand, division is nearly twice as fast with the bitwise shift and the bitwise modulus (really just an & ) is more than three times faster!


3 Answers

A && B:

if (!A) {
  return 0;
}
if (!B) {
  return 0;
}
return 1;

A & B:

return A & B;

These are the semantics considering that evaluating A and B can have side effects (they can be functions that alter the state of the system when evaluated).

There are many ways that the compiler can optimize the A && B case, depending on the types of A and B and the context.

like image 75
Roger Dahl Avatar answered Oct 01 '22 02:10

Roger Dahl


Logical operators will often result in branches, particularly when the rules of short circuit evaluation need to be observed. For normal CPUs this can mean branch misprediction and for CUDA it can mean warp divergence. Bitwise operations do not require short circuit evaluation so the code flow is linear (i.e. branchless).

like image 28
Paul R Avatar answered Oct 01 '22 03:10

Paul R


Bitwise operations can be carried out in registers at hardware level. Register operations are the fastest, this is specially true when the data can fit in the register. Logical operations involve expression evaluation which may not be register bound. Typically &, |, ^, >>... are some of the fastest operations and used widely in high performance logic.

like image 35
questzen Avatar answered Oct 01 '22 03:10

questzen