I was working on highly "vectorizable" code and noted that regarding the C++ __restrict keyword/extension ~, Clang's behavior is different and impractical compared to GCC even in a simple case.
For compiler generated code, the slowdown is about 15x (in my specific case, not the exemple below).
Here is the code (also available at https://godbolt.org/z/sdGd43x75):
struct Param {
int *x;
};
int foo(int *a, int *b) {
*a = 5;
*b = 6;
// No significant optimization here, as expected (for clang/gcc)
return *a + *b;
}
int foo(Param a, Param b) {
*a.x = 5;
*b.x = 6;
// No significant optimization here, as expected (for clang/gcc)
return *a.x + *b.x;
}
/////////////////////
struct ParamR {
// "Restricted pointers assert that members point to disjoint storage"
// https://en.cppreference.com/w/c/language/restrict, is restrict's
// interpretation for C can be used in C++ (for __restrict too ?) ?
int *__restrict x;
};
int rfoo(int *__restrict a, int *__restrict b) {
*a = 5;
*b = 6;
// Significant optimization here, as expected (for clang/gcc)
return *a + *b;
}
int rfoo(ParamR a, ParamR b) {
*a.x = 5;
*b.x = 6;
// No significant optimization here, NOT expected (clang fails?, gcc optimizes)
return *a.x + *b.x;
}
int rfoo(ParamR *__restrict a, ParamR *__restrict b) {
*a->x = 5;
*b->x = 6;
// No significant optimization here, NOT expected (clang fails?, gcc optimizes)
return *a->x + *b->x;
}
This happens for both C++ (__restrict) and C code (using the std restrict).
How can I make Clang understand that the pointer will always point to disjoint storage ?
It appears to be a bug. Well I don't know if I should call it a bug as it does create correct behavior for the program, let's say it is a missed opportunity in the optimizer.
I have tried a few workarounds and the only thing that worked is to always pass a pointer as a restrict parameter. Like so:
int rfoo(int *__restrict a, int *__restrict b) {
*a = 5;
*b = 6;
// Significant optimization here, as expected (for clang/gcc)
return *a + *b;
}
// change this:
int rfoo(ParamR a, ParamR b) {
*a.x = 5;
*b.x = 6;
// No significant optimization here, NOT expected (clang fails?, gcc optimizes)
return *a.x + *b.x;
}
// to this:
int rfoo2(ParamR a, ParamR b) {
return rfoo(a.x, b.x);
}
Output from clang 12.0.0:
rfoo(ParamR, ParamR): # @rfoo(ParamR, ParamR)
mov dword ptr [rdi], 5
mov dword ptr [rsi], 6
mov eax, dword ptr [rdi]
add eax, 6
ret
rfoo2(ParamR, ParamR): # @rfoo2(ParamR, ParamR)
mov dword ptr [rdi], 5
mov dword ptr [rsi], 6
mov eax, 11
ret
Now this is terrible inconvenient, especially for more complex code, but if the performance difference is that great and important and you can't change to gcc it might be something considering doing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With