So, just for fun, and out of curiosity, I wanted to see what executes faster when doing an even-odd check, modulus or bitwise comparisons.
So, I whipped up the following, but I'm not sure that it's behaving correctly, as the difference is so small. I read somewhere online that bitwise should be an order of magnitude faster than modulus checking.
Is it possible that it's getting optimized away? I've just started tinkering with assembly, otherwise I'd attempt to dissect the executable a bit.
EDIT 3: Here is a working test, thanks in a large way to @phonetagger:
#include <stdio.h>
#include <time.h>
#include <stdint.h>
// to reset the global
static const int SEED = 0x2A;
// 5B iterations, each
static const int64_t LOOPS = 5000000000;
int64_t globalVar;
// gotta call something
int64_t doSomething( int64_t input )
{
return 1 + input;
}
int main(int argc, char *argv[])
{
globalVar = SEED;
// mod
clock_t startMod = clock();
for( int64_t i=0; i<LOOPS; ++i )
{
if( ( i % globalVar ) == 0 )
{
globalVar = doSomething(globalVar);
}
}
clock_t endMod = clock();
double modTime = (double)(endMod - startMod) / CLOCKS_PER_SEC;
globalVar = SEED;
// bit
clock_t startBit = clock();
for( int64_t j=0; j<LOOPS; ++j )
{
if( ( j & globalVar ) == 0 )
{
globalVar = doSomething(globalVar);
}
}
clock_t endBit = clock();
double bitTime = (double)(endBit - startBit) / CLOCKS_PER_SEC;
printf("Mod: %lf\n", modTime);
printf("Bit: %lf\n", bitTime);
printf("Dif: %lf\n", ( modTime > bitTime ? modTime-bitTime : bitTime-modTime ));
}
5 billion iterations of each loop, with a global removing compiler optimization yields the following:
Mod: 93.099101
Bit: 16.701401
Dif: 76.397700
The Execution Profiler records timing and execution statistics about instructions for the complete program code. To view the values in the Editor or Disassembly Window, use Show Time or Show Calls from the menu Debug — Execution Profiling.
The C Profiler tool enables the collection and display of execution profile data on C software source code bases of arbitrary size. It is a member of SD's family of Profiler tools.
In software engineering, profiling ("program profiling", "software profiling") is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls.
A profiler is a tool that monitors the execution of another application. A common language runtime (CLR) profiler is a dynamic link library (DLL) that consists of functions that receive messages from, and send messages to, the CLR by using the profiling API. The profiler DLL is loaded by the CLR at run time.
gcc foo.c -std=c99 -S -O0
(note, I specifically did -O0
) for x86 gave me the same assembly for both loops. Operator strength reduction meant that both if
s used an andl
to get the job done (which is faster than a modulo on Intel machines):
First Loop:
.L6:
movl 72(%esp), %eax
andl $1, %eax
testl %eax, %eax
jne .L5
call doNothing
.L5:
addl $1, 72(%esp)
.L4:
movl LOOPS, %eax
cmpl %eax, 72(%esp)
jl .L6
Second Loop:
.L9:
movl 76(%esp), %eax
andl $1, %eax
testl %eax, %eax
jne .L8
call doNothing
.L8:
addl $1, 76(%esp)
.L7:
movl LOOPS, %eax
cmpl %eax, 76(%esp)
jl .L9
The miniscule difference you see is probably because of the resolution/inaccuracy of clock
.
Most compilers will compile both of the following to EXACTLY the same machine instruction(s):
if( ( i % 2 ) == 0 )
if( ( i & 1 ) == 0 )
...even without ANY "optimization" turned on. The reason is that you are MOD-ing and AND-ing with constant values, and a %2 operation is, as any compiler writer should know, functionally equivalent to an &1 operation. In fact, MOD by any power-of-2 has an equivalent AND operation. If you really want to test the difference, you'll need to make the right-hand-side of both operations be variable, and to be absolutely sure the compiler's cleverness isn't thwarting your efforts, you'll need to bury the variables' initializations somewhere that the compiler can't tell at that point what its runtime value will be; i.e. you'll need to pass the values into a GLOBALLY-DECLARED (i.e. not 'static') test function as parameters, in which case the compiler can't trace back to their definition & substitute the variables with constants, because theoretically any external caller could pass any values in for those parameters. Alternatively, you could leave the code in main() and define the variables globally, in which case the compiler can't substitute them with constants because it can't know for sure that another function may have altered the value of the global variables.
Incidentally, this same issue exists for divide operations.... Divisions by constant powers-of-two can be substituted with an equivalent right-shift (>>) operation. The same trick works for multiplication (<<), but the benefits are less (or nonexistant) for multiplications. True division operations just take a long time in hardware, though significant improvements have been made in most modern processors vs. even 15 years ago, division operations still take maybe 80 clock cycles, while a >> operation takes only a single cycle. You're not going to see an "order of magnitude" improvement using bitwise tricks on modern processors, but most compilers will still use those tricks because there is still some noticeable improvement.
EDIT: On some embedded processors (and, unbelievable though it was, the original Sparc desktop/workstation processor versions before v8), there isn't even a divide instruction at all. All true divide & mod operations on such processors must be performed entirely in software, which can be a monstrously expensive operation. In that sort of environment, you surely would see an order of magnitude difference.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With