Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

x64 performance compared to x86

I wrote this little program in c++ to in order check CPU load scenarios.

#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <windows.h>
#include <time.h>
int main()
{

    double x = 1;
    int t1 = GetTickCount();
    srand(10000);

    for (unsigned long i = 0; i < 10000000; i++)
    {
        int r = rand();
        double l = sqrt((double)r);
        x *= log(l/3) * pow(x, r);
    }

    int t2 = GetTickCount();
    printf("Time: %d\r\n", t2-t1);
    getchar();
}

I compiled it both for x86 and for x64 on win7 x64.
For some reason when I ran the x64 version it finished running in about 3 seconds
but when I tried it with the x86 version it took 48 (!!!) seconds.
I tried it many times and always got similar results.
What could cause this difference?

like image 339
Idov Avatar asked Nov 29 '22 17:11

Idov


1 Answers

Looking at the assembler output with /Ox (maximum optimizations), the speed difference between the x86 and x64 build is obvious:

; cl /Ox /Fa tick.cpp
; x86 Line 17: x *= log(l/3) * pow(x, r)
fld     QWORD PTR _x$[esp+32]
mov     eax, esi
test    esi, esi
; ...

We see that x87 instructions are being used for this computation. Compare this to the x64 build:

; cl /Ox /Fa tick.cpp
; x64 Line 17: x *= log(l/3) * pow(x, r)
movapd  xmm1, xmm8
mov     ecx, ebx
movapd  xmm5, xmm0
test    ebx, ebx
; ...

Now we see SSE instructions being used instead.

You can pass /arch:SSE2 to try and massage Visual Studio 2010 to produce similar instructions, but it appears the 64bit compiler simply produces much betterfaster assembly for your task at hand.

Finally, if you relax the floating point model the x86 and x64 perform nearly identically.

Timings, unscientific best of 3:

  • x86, /Ox: 22704 ticks
  • x64, /Ox: 822 ticks
  • x86, /Ox /arch:SSE2: 3432 ticks
  • x64, /Ox /favor:INTEL64: 1014 ticks
  • x86, /Ox /arch:SSE2 /fp:fast: 834 ticks
like image 186
user7116 Avatar answered Dec 01 '22 05:12

user7116