When I compile this snippet (with -Ofast -floop-nest-optimize
) gcc generates assembly which traverses the array in source order.
However, if I uncomment the line // n = 32767
and assign any number to n
, it interchanges the index order to x[i * n + j]
. Traversing memory in contiguous row-major order is much more cache-friendly than striding down columns.
float matrix_sum_column_major(float* x, int n) {
// n = 32767;
float sum = 0;
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
sum += x[j * n + i];
return sum;
}
On godbolt
Why can't GCC or clang do loop interchange with a runtime-variable int
size? Real-world code won't usually have the size declared explicitly.
PD: I've tried this with different versions of gcc and clang-9 and it seems to happen in both.
PD2: Even if I make x
be a local variable malloc
ed inside the function it still happens.
Compilers generally focus their efforts (and should focus their efforts) on places where constructs which will likely be used by programmers interested in efficiency can be replaced with other constructs that are easily proven to be equivalent in all cases that should be expected to matter. If n
is a constant, a compiler can determine the exact set of array indices that will be used in the loop and then figure out how to process all those indices. If n
isn't constant, a compiler might be able to determine that when n
is positive, code will use all indices from 0
to n*n-1
, but that would likely require a lot more effort. The authors of clang and might have been able to make such a determination in this case if they tried hard enough, but they likely thought the effort wasn't worthwhile.
Note that if code will use a few particular values of n
far more than any others, having code explicitly check for those values and use loops tailored for them, a compiler might be able to generate far more efficient code for those loops than would be possible for loops that can use an arbitrary n
. Because many real-world problems would likely have some values of n
that get used much more than others, it would not be unreasonable for a compiler writer to assume that programmers interested in performance would be likely to use such special-purpose loops, and spending a certain amount of effort improving the arbitrary-n
loop may offer less benefit than spending the same amount of effort elsewhere.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With