Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do most compilers transform % 2 into bit comparison? Is it really faster?

In programming, one often needs to check if a number is odd or even. For that, we usually use:

n % 2 == 0

However, my understanding is that the '%' operator actually performs a division and returns its remainder; therefore, for the case above, it would be faster to simply check the last bit instead. Let's say n = 5;

5 = 00000101

In order to check if the number is odd or even, we just need to check the last bit. If it's 1, the number is odd; otherwise, it is even. In programming, it would be expressed like this:

n & 1 == 0

In my understanding this would be faster than % 2 as no division is performed. A mere bit comparison is needed.

I have 2 questions then:

1) Is the second way really faster than the first (in all cases)?

2) If the answer for 1 is yes, are compilers (in all languages) smart enough to convert % 2 into a simple bit comparison? Or do we have to explicitly use the second way if we want the best performance?

like image 800
Bitcoin Cash - ADA enthusiast Avatar asked Aug 16 '15 21:08

Bitcoin Cash - ADA enthusiast


1 Answers

Yes, a bit-test is much faster than integer division, by about a factor of 10 to 20, or even 100 for 128bit / 64bit = 64bit idiv on Intel. Esp. since x86 at least has a test instruction that sets condition flags based on the result of a bitwise AND, so you don't have to divide and then compare; the bitwise AND is the compare.

I decided to actually check the compiler output on Godbolt, and got a surprise:

It turns out that using n % 2 as a signed integer value (e.g. a return n % 2 from a function that return signed int) instead of just testing it for non-zero (if (n % 2)) sometimes produces slower code than return n & 1. This is because (-1 % 2) == -1, while (-1 & 1) == 1, so the compiler can't use a bitwise AND. Compilers still avoid integer division, though, and use some clever shift / and / add / sub sequence instead, because that's still cheaper than an integer division. (gcc and clang use different sequences.)

So if you want to return a truth value based on n % 2, your best bet is to do it with an unsigned type. This lets the compiler always optimize it to a single AND instruction. (On godbolt, you can flip to other architectures, like ARM and PowerPC, and see that the unsigned even (%) function and the int even_bit (bitwise &) function have the same asm code.)

Using a bool (which must be 0 or 1, not just any non-zero value) is another option, but the compiler will have to do extra work to return (bool) (n % 4) (or any test other than n%2). The bitwise-and version of that will be 0, 1, 2, or 3, so the compiler has to turn any non-zero value into a 1. (x86 has an efficient setcc instruction that sets a register to 0 or 1, depending on the flags, so it's still only 2 instructions instead of 1. clang/gcc use this, see aligned4_bool in the godbolt asm output.)

With any optimization level higher than -O0, gcc and clang optimize if (n%2) to what we expect. The other huge surprise is that icc 13 doesn't. I don't understand WTF icc thinks it's doing with all those branches.

like image 89
Peter Cordes Avatar answered Oct 22 '22 14:10

Peter Cordes