I wrote this little program in c++ to in order check CPU load scenarios.
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <windows.h>
#include <time.h>
int main()
{
double x = 1;
int t1 = GetTickCount();
srand(10000);
for (unsigned long i = 0; i < 10000000; i++)
{
int r = rand();
double l = sqrt((double)r);
x *= log(l/3) * pow(x, r);
}
int t2 = GetTickCount();
printf("Time: %d\r\n", t2-t1);
getchar();
}
I compiled it both for x86 and for x64 on win7 x64.
For some reason when I ran the x64 version it finished running in about 3 seconds
but when I tried it with the x86 version it took 48 (!!!) seconds.
I tried it many times and always got similar results.
What could cause this difference?
Looking at the assembler output with /Ox
(maximum optimizations), the speed difference between the x86 and x64 build is obvious:
; cl /Ox /Fa tick.cpp
; x86 Line 17: x *= log(l/3) * pow(x, r)
fld QWORD PTR _x$[esp+32]
mov eax, esi
test esi, esi
; ...
We see that x87 instructions are being used for this computation. Compare this to the x64 build:
; cl /Ox /Fa tick.cpp
; x64 Line 17: x *= log(l/3) * pow(x, r)
movapd xmm1, xmm8
mov ecx, ebx
movapd xmm5, xmm0
test ebx, ebx
; ...
Now we see SSE instructions being used instead.
You can pass /arch:SSE2
to try and massage Visual Studio 2010 to produce similar instructions, but it appears the 64bit compiler simply produces much betterfaster assembly for your task at hand.
Finally, if you relax the floating point model the x86 and x64 perform nearly identically.
Timings, unscientific best of 3:
/Ox
: 22704 ticks/Ox
: 822 ticks/Ox /arch:SSE2
: 3432 ticks/Ox /favor:INTEL64
: 1014 ticks/Ox /arch:SSE2 /fp:fast
: 834 ticksIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With