Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why can GCC only do loop interchange optimization when the int size is a compile-time constant?

When I compile this snippet (with -Ofast -floop-nest-optimize) gcc generates assembly which traverses the array in source order.

However, if I uncomment the line // n = 32767 and assign any number to n, it interchanges the index order to x[i * n + j]. Traversing memory in contiguous row-major order is much more cache-friendly than striding down columns.

float matrix_sum_column_major(float* x, int n) {
    // n = 32767;
    float sum = 0;
    for (int i = 0; i < n; i++)
        for (int j = 0; j < n; j++)
            sum += x[j * n + i];
    return sum;
}

On godbolt

Why can't GCC or clang do loop interchange with a runtime-variable int size? Real-world code won't usually have the size declared explicitly.

PD: I've tried this with different versions of gcc and clang-9 and it seems to happen in both.
PD2: Even if I make x be a local variable malloced inside the function it still happens.

like image 391
Mateo de Mayo Avatar asked Oct 15 '25 11:10

Mateo de Mayo


1 Answers

Compilers generally focus their efforts (and should focus their efforts) on places where constructs which will likely be used by programmers interested in efficiency can be replaced with other constructs that are easily proven to be equivalent in all cases that should be expected to matter. If n is a constant, a compiler can determine the exact set of array indices that will be used in the loop and then figure out how to process all those indices. If n isn't constant, a compiler might be able to determine that when n is positive, code will use all indices from 0 to n*n-1, but that would likely require a lot more effort. The authors of clang and might have been able to make such a determination in this case if they tried hard enough, but they likely thought the effort wasn't worthwhile.

Note that if code will use a few particular values of n far more than any others, having code explicitly check for those values and use loops tailored for them, a compiler might be able to generate far more efficient code for those loops than would be possible for loops that can use an arbitrary n. Because many real-world problems would likely have some values of n that get used much more than others, it would not be unreasonable for a compiler writer to assume that programmers interested in performance would be likely to use such special-purpose loops, and spending a certain amount of effort improving the arbitrary-n loop may offer less benefit than spending the same amount of effort elsewhere.

like image 152
supercat Avatar answered Oct 17 '25 01:10

supercat



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!