In programming, one often needs to check if a number is odd or even. For that, we usually use:
n % 2 == 0
However, my understanding is that the '%'
operator actually performs a division and returns its remainder; therefore, for the case above, it would be faster to simply check the last bit instead. Let's say n = 5;
5 = 00000101
In order to check if the number is odd or even, we just need to check the last bit. If it's 1
, the number is odd; otherwise, it is even. In programming, it would be expressed like this:
n & 1 == 0
In my understanding this would be faster than % 2
as no division is performed. A mere bit comparison is needed.
I have 2 questions then:
1) Is the second way really faster than the first (in all cases)?
2) If the answer for 1 is yes, are compilers (in all languages) smart enough to convert % 2
into a simple bit comparison? Or do we have to explicitly use the second way if we want the best performance?
Yes, a bit-test is much faster than integer division, by about a factor of 10 to 20, or even 100 for 128bit / 64bit = 64bit idiv on Intel. Esp. since x86 at least has a test
instruction that sets condition flags based on the result of a bitwise AND, so you don't have to divide and then compare; the bitwise AND
is the compare.
I decided to actually check the compiler output on Godbolt, and got a surprise:
It turns out that using n % 2
as a signed integer value (e.g. a return n % 2
from a function that return signed int
) instead of just testing it for non-zero (if (n % 2)
) sometimes produces slower code than return n & 1
. This is because (-1 % 2) == -1
, while (-1 & 1) == 1
, so the compiler can't use a bitwise AND. Compilers still avoid integer division, though, and use some clever shift / and / add / sub sequence instead, because that's still cheaper than an integer division. (gcc and clang use different sequences.)
So if you want to return a truth value based on n % 2
, your best bet is to do it with an unsigned type. This lets the compiler always optimize it to a single AND instruction. (On godbolt, you can flip to other architectures, like ARM and PowerPC, and see that the unsigned even
(%
) function and the int even_bit
(bitwise &
) function have the same asm code.)
Using a bool
(which must be 0 or 1, not just any non-zero value) is another option, but the compiler will have to do extra work to return (bool) (n % 4)
(or any test other than n%2
). The bitwise-and version of that will be 0, 1, 2, or 3, so the compiler has to turn any non-zero value into a 1. (x86 has an efficient setcc
instruction that sets a register to 0 or 1, depending on the flags, so it's still only 2 instructions instead of 1. clang/gcc use this, see aligned4_bool
in the godbolt asm output.)
With any optimization level higher than -O0
, gcc and clang optimize if (n%2)
to what we expect. The other huge surprise is that icc 13 doesn't. I don't understand WTF icc thinks it's doing with all those branches.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With