Profile C Execution

Tags:

So, just for fun, and out of curiosity, I wanted to see what executes faster when doing an even-odd check, modulus or bitwise comparisons.

So, I whipped up the following, but I'm not sure that it's behaving correctly, as the difference is so small. I read somewhere online that bitwise should be an order of magnitude faster than modulus checking.

Is it possible that it's getting optimized away? I've just started tinkering with assembly, otherwise I'd attempt to dissect the executable a bit.

EDIT 3: Here is a working test, thanks in a large way to @phonetagger:

Click to copy

#include <stdio.h>
#include <time.h>
#include <stdint.h>

// to reset the global
static const int SEED = 0x2A;

// 5B iterations, each
static const int64_t LOOPS = 5000000000;

int64_t globalVar;

// gotta call something
int64_t doSomething( int64_t input )
{
  return 1 + input;
}

int main(int argc, char *argv[]) 
{
  globalVar = SEED;

  // mod  
  clock_t startMod = clock();

  for( int64_t i=0; i<LOOPS; ++i )
  {    
    if( ( i % globalVar ) == 0 )
    {
      globalVar = doSomething(globalVar);      
    }    
  }

  clock_t endMod = clock();

  double modTime = (double)(endMod - startMod) / CLOCKS_PER_SEC;

  globalVar = SEED;

  // bit
  clock_t startBit = clock();

  for( int64_t j=0; j<LOOPS; ++j )
  {
    if( ( j & globalVar ) == 0 )
    {
      globalVar = doSomething(globalVar);
    }       
  }

  clock_t endBit = clock();

  double bitTime = (double)(endBit - startBit) / CLOCKS_PER_SEC;

  printf("Mod: %lf\n", modTime);
  printf("Bit: %lf\n", bitTime);  
  printf("Dif: %lf\n", ( modTime > bitTime ? modTime-bitTime : bitTime-modTime ));
}

5 billion iterations of each loop, with a global removing compiler optimization yields the following:

Click to copy

Mod: 93.099101
Bit: 16.701401
Dif: 76.397700

986

asked Jul 19 '12 16:07

Josh

2 Answers

gcc foo.c -std=c99 -S -O0 (note, I specifically did -O0) for x86 gave me the same assembly for both loops. Operator strength reduction meant that both ifs used an andl to get the job done (which is faster than a modulo on Intel machines):

First Loop:

Click to copy

.L6:
        movl    72(%esp), %eax
        andl    $1, %eax
        testl   %eax, %eax
        jne     .L5
        call    doNothing
.L5:
        addl    $1, 72(%esp)
.L4:
        movl    LOOPS, %eax
        cmpl    %eax, 72(%esp)
        jl      .L6

Second Loop:

Click to copy

.L9:
        movl    76(%esp), %eax
        andl    $1, %eax
        testl   %eax, %eax
        jne     .L8
        call    doNothing
.L8:
        addl    $1, 76(%esp)
.L7:
        movl    LOOPS, %eax
        cmpl    %eax, 76(%esp)
        jl      .L9

The miniscule difference you see is probably because of the resolution/inaccuracy of clock.

157

answered Sep 17 '22 23:09

ArjunShankar

Most compilers will compile both of the following to EXACTLY the same machine instruction(s):

Click to copy

if( ( i % 2 ) == 0 )

if( ( i & 1 ) == 0 )

...even without ANY "optimization" turned on. The reason is that you are MOD-ing and AND-ing with constant values, and a %2 operation is, as any compiler writer should know, functionally equivalent to an &1 operation. In fact, MOD by any power-of-2 has an equivalent AND operation. If you really want to test the difference, you'll need to make the right-hand-side of both operations be variable, and to be absolutely sure the compiler's cleverness isn't thwarting your efforts, you'll need to bury the variables' initializations somewhere that the compiler can't tell at that point what its runtime value will be; i.e. you'll need to pass the values into a GLOBALLY-DECLARED (i.e. not 'static') test function as parameters, in which case the compiler can't trace back to their definition & substitute the variables with constants, because theoretically any external caller could pass any values in for those parameters. Alternatively, you could leave the code in main() and define the variables globally, in which case the compiler can't substitute them with constants because it can't know for sure that another function may have altered the value of the global variables.

Incidentally, this same issue exists for divide operations.... Divisions by constant powers-of-two can be substituted with an equivalent right-shift (>>) operation. The same trick works for multiplication (<<), but the benefits are less (or nonexistant) for multiplications. True division operations just take a long time in hardware, though significant improvements have been made in most modern processors vs. even 15 years ago, division operations still take maybe 80 clock cycles, while a >> operation takes only a single cycle. You're not going to see an "order of magnitude" improvement using bitwise tricks on modern processors, but most compilers will still use those tricks because there is still some noticeable improvement.

EDIT: On some embedded processors (and, unbelievable though it was, the original Sparc desktop/workstation processor versions before v8), there isn't even a divide instruction at all. All true divide & mod operations on such processors must be performed entirely in software, which can be a monstrously expensive operation. In that sort of environment, you surely would see an order of magnitude difference.

answered Sep 20 '22 23:09

phonetagger

Related questions
                            
                                Communication between Linux programs
                            
                                Tensor multiplication library
                            
                                C wrong sum result with doubles
                            
                                Wcsstr no case sensitivity
                            
                                Building the library for check test C framework
                            
                                Parsing fuzzy dates in Haskell (or C)
                            
                                Assigning values to a key in a pointer to a record in C
                            
                                how to TRIM a block on SSD disk?
                            
                                Using macros in C to define data structures
                            
                                Can't understand the working of getint() in C as per K&R
                            
                                Pointer cast for use with qsort
                            
                                C strings in a switch statement [duplicate]
                            
                                Integral promotion
                            
                                Building a C shell. execvp returns 'No such file' error. creating argv array on-the-fly with malloc
                            
                                Pointers , Multidimensional arrays and Addresses
                            
                                UTF-8 Width Display Issue of Chinese Characters
                            
                                what is wrong with this call to the java method?
                            
                                Do I need to disconnect an xcb_connection_t that I got from XGetXCBConnection?
                            
                                Invoking debugger from within a C program
                            
                                Porting module to newer Linux kernel: Cannot allocate memory

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Profile C Execution

Tags:

c

execution-time

profiler

clock

Josh

People also ask

2 Answers

ArjunShankar

phonetagger

Recent Activity

Donate For Us