For the following function, the code with optimizations is vectorized and the computation is performed in registers (the return value is returned in eax
). Generated machine code is, e.g., here: https://godbolt.org/z/VQEBV4.
int sum(int *arr, int n) {
int ret = 0;
for (int i = 0; i < n; i++)
ret += arr[i];
return ret;
}
However, if I make ret
variable global (or, a parameter of type int&
), the vectorization is not used and the compiler stores the updated ret
in each iteration to memory. Machine code: https://godbolt.org/z/NAmX4t.
int ret = 0;
int sum(int *arr, int n) {
for (int i = 0; i < n; i++)
ret += arr[i];
return ret;
}
I don't understand why the optimizations (vectorization/computations in registers) are prevented in the latter case. There is no threading, even the increments are not performed atomically. Moreover, this behavior seems to be consistent across compilers (GCC, Clang, Intel), so I believe there must be some reason for it.
If ret
is not local but global, arr
might alias to ret
reducing opportunity to optimize.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With