Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Computation is optimized only if variable updated in loop is local

For the following function, the code with optimizations is vectorized and the computation is performed in registers (the return value is returned in eax). Generated machine code is, e.g., here: https://godbolt.org/z/VQEBV4.

int sum(int *arr, int n) {
  int ret = 0;
  for (int i = 0; i < n; i++)
    ret += arr[i];
  return ret;
}

However, if I make ret variable global (or, a parameter of type int&), the vectorization is not used and the compiler stores the updated ret in each iteration to memory. Machine code: https://godbolt.org/z/NAmX4t.

int ret = 0;

int sum(int *arr, int n) {
  for (int i = 0; i < n; i++)
    ret += arr[i];
  return ret;
}

I don't understand why the optimizations (vectorization/computations in registers) are prevented in the latter case. There is no threading, even the increments are not performed atomically. Moreover, this behavior seems to be consistent across compilers (GCC, Clang, Intel), so I believe there must be some reason for it.

like image 639
Daniel Langr Avatar asked Oct 16 '22 08:10

Daniel Langr


1 Answers

If ret is not local but global, arr might alias to ret reducing opportunity to optimize.

like image 65
Jarod42 Avatar answered Oct 19 '22 01:10

Jarod42