Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Clang's __restrict is inconsistent?

I was working on highly "vectorizable" code and noted that regarding the C++ __restrict keyword/extension ~, Clang's behavior is different and impractical compared to GCC even in a simple case.

For compiler generated code, the slowdown is about 15x (in my specific case, not the exemple below).

Here is the code (also available at https://godbolt.org/z/sdGd43x75):

struct Param {
    int *x;
};

int foo(int *a, int *b) {
    *a = 5;
    *b = 6;
    // No significant optimization here, as expected (for clang/gcc)
    return *a + *b;
}

int foo(Param a, Param b) {
    *a.x = 5;
    *b.x = 6;
    // No significant optimization here, as expected (for clang/gcc)
    return *a.x + *b.x;
}

/////////////////////

struct ParamR {
    // "Restricted pointers assert that members point to disjoint storage"
    // https://en.cppreference.com/w/c/language/restrict, is restrict's 
    // interpretation for C can be used in C++ (for __restrict too ?) ?
    int *__restrict x;
};

int rfoo(int *__restrict a, int *__restrict b) {
    *a = 5;
    *b = 6;
    // Significant optimization here, as expected (for clang/gcc)
    return *a + *b;
}

int rfoo(ParamR a, ParamR b) {
    *a.x = 5;
    *b.x = 6;
    // No significant optimization here, NOT expected (clang fails?, gcc optimizes)
    return *a.x + *b.x;
}

int rfoo(ParamR *__restrict a, ParamR *__restrict b) {
    *a->x = 5;
    *b->x = 6;
    // No significant optimization here, NOT expected (clang fails?, gcc optimizes)
    return *a->x + *b->x;
}

This happens for both C++ (__restrict) and C code (using the std restrict).

How can I make Clang understand that the pointer will always point to disjoint storage ?

like image 775
Etienne M Avatar asked Nov 06 '22 23:11

Etienne M


1 Answers

It appears to be a bug. Well I don't know if I should call it a bug as it does create correct behavior for the program, let's say it is a missed opportunity in the optimizer.

I have tried a few workarounds and the only thing that worked is to always pass a pointer as a restrict parameter. Like so:

int rfoo(int *__restrict a, int *__restrict b) {
    *a = 5;
    *b = 6;
    // Significant optimization here, as expected (for clang/gcc)
    return *a + *b;
}

// change this:
int rfoo(ParamR a, ParamR b) {
    *a.x = 5;
    *b.x = 6;
    // No significant optimization here, NOT expected (clang fails?, gcc optimizes)
    return *a.x + *b.x;
}

// to this:
int rfoo2(ParamR a, ParamR b) {
    return rfoo(a.x, b.x);
}

Output from clang 12.0.0:

rfoo(ParamR, ParamR):                       # @rfoo(ParamR, ParamR)
        mov     dword ptr [rdi], 5
        mov     dword ptr [rsi], 6
        mov     eax, dword ptr [rdi]
        add     eax, 6
        ret
rfoo2(ParamR, ParamR):                      # @rfoo2(ParamR, ParamR)
        mov     dword ptr [rdi], 5
        mov     dword ptr [rsi], 6
        mov     eax, 11
        ret

Now this is terrible inconvenient, especially for more complex code, but if the performance difference is that great and important and you can't change to gcc it might be something considering doing.

like image 98
bolov Avatar answered Nov 13 '22 05:11

bolov