Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Puzzling performance difference between mac and a relatively powerful desktop

My original intention for writing this piece of code is to measure performance difference when an entire array is operated on by a function vs operating individual elements of an array.

i.e. comparing the following two statements:

function_vector(x, y, z, n); 

vs

for(int i=0; i<n; i++){
        function_scalar(x[i], y[i], z[i]);
}

where function_* does some substantial but identical calculations.

With -ffast-math turned on, the scalar version is roughly 2x faster on multiple machines I have tested on.

However, whats puzzling is the comparison of timings on two different machines, both using gcc 6.3.0:

# on desktop with Intel-Core-i7-4930K-Processor-12M-Cache-up-to-3_90-GHz
g++ loop_test.cpp -o loop_test -std=c++11 -O3
./loop_test 
vector time = 12.3742 s
scalar time = 10.7406 s

g++ loop_test.cpp -o loop_test -std=c++11 -O3 -ffast-math
./loop_test 
vector time = 11.2543 s
scalar time = 5.70873 s


# on mac with Intel-Core-i5-4258U-Processor-3M-Cache-up-to-2_90-GHz
g++ loop_test.cpp -o loop_test -std=c++11 -O3
./loop_test 
vector time = 2.89193 s
scalar time = 1.87269 s

g++ loop_test.cpp -o loop_test -std=c++11 -O3 -ffast-math
./loop_test 
vector time = 2.38422 s
scalar time = 0.995433 s

By all means the first machine is superior in terms of cache size, clock speed etc. Still the code runs 5x faster on the second machine.

Question:

Can this be explained? Or am I doing something wrong here?

Link to the code: https://gist.github.com/anandpratap/262a72bd017fdc6803e23ed326847643

Edit

After comments from ShadowRanger, I added the __restrict__ keyword to function_vector and -march=native compilation flag. This gives:

# on desktop with Intel-Core-i7-4930K-Processor-12M-Cache-up-to-3_90-GHz
vector time = 1.3767 s
scalar time = 1.28002 s

# on mac with Intel-Core-i5-4258U-Processor-3M-Cache-up-to-2_90-GHz 
vector time = 1.05206 s
scalar time = 1.07556 s
like image 839
0b1100001 Avatar asked Mar 18 '17 05:03

0b1100001


1 Answers

Odds are possible pointer aliasing is limiting optimizations in the vectorized case.

Try changing the declaration of function_vector to:

void function_vector(double *__restrict__ x, double *__restrict__ y, double *__restrict__ z, const int n){

to use g++'s non-standard support for a feature matching C99's restrict keyword.

Without it, function_vector likely has to assume that the writes to x[i] could be modifying values in y or z, so it can't do read-ahead to get the values.

like image 185
ShadowRanger Avatar answered Nov 03 '22 17:11

ShadowRanger