My original intention for writing this piece of code is to measure performance difference when an entire array is operated on by a function vs operating individual elements of an array.
i.e. comparing the following two statements:
function_vector(x, y, z, n);
vs
for(int i=0; i<n; i++){
function_scalar(x[i], y[i], z[i]);
}
where function_*
does some substantial but identical calculations.
With -ffast-math
turned on, the scalar version is roughly 2x faster on multiple machines I have tested on.
However, whats puzzling is the comparison of timings on two different machines, both using gcc 6.3.0:
# on desktop with Intel-Core-i7-4930K-Processor-12M-Cache-up-to-3_90-GHz
g++ loop_test.cpp -o loop_test -std=c++11 -O3
./loop_test
vector time = 12.3742 s
scalar time = 10.7406 s
g++ loop_test.cpp -o loop_test -std=c++11 -O3 -ffast-math
./loop_test
vector time = 11.2543 s
scalar time = 5.70873 s
# on mac with Intel-Core-i5-4258U-Processor-3M-Cache-up-to-2_90-GHz
g++ loop_test.cpp -o loop_test -std=c++11 -O3
./loop_test
vector time = 2.89193 s
scalar time = 1.87269 s
g++ loop_test.cpp -o loop_test -std=c++11 -O3 -ffast-math
./loop_test
vector time = 2.38422 s
scalar time = 0.995433 s
By all means the first machine is superior in terms of cache size, clock speed etc. Still the code runs 5x faster on the second machine.
Question:
Can this be explained? Or am I doing something wrong here?
Link to the code: https://gist.github.com/anandpratap/262a72bd017fdc6803e23ed326847643
Edit
After comments from ShadowRanger, I added the __restrict__
keyword to function_vector
and -march=native
compilation flag. This gives:
# on desktop with Intel-Core-i7-4930K-Processor-12M-Cache-up-to-3_90-GHz
vector time = 1.3767 s
scalar time = 1.28002 s
# on mac with Intel-Core-i5-4258U-Processor-3M-Cache-up-to-2_90-GHz
vector time = 1.05206 s
scalar time = 1.07556 s
Odds are possible pointer aliasing is limiting optimizations in the vectorized case.
Try changing the declaration of function_vector
to:
void function_vector(double *__restrict__ x, double *__restrict__ y, double *__restrict__ z, const int n){
to use g++
's non-standard support for a feature matching C99's restrict
keyword.
Without it, function_vector
likely has to assume that the writes to x[i]
could be modifying values in y
or z
, so it can't do read-ahead to get the values.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With