Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is clang dereferencing a parameter on every use?

I have been performing performance optimisations on some code at work, and stumbled upon some strange behaviour, which I've boiled down to the simple snippet of C++ code below:

#include <stdint.h>

void Foo(uint8_t*& out)
{
    out[0] = 1;
    out[1] = 2;
    out[2] = 3;
    out[3] = 4;
}

I then compile it with clang (on Windows) with the following: clang -S -O3 -masm=intel test.cpp. This results in the following assembly:

        mov     rax, qword ptr [rcx]
        mov     byte ptr [rax], 1
        mov     rax, qword ptr [rcx]
        mov     byte ptr [rax + 1], 2
        mov     rax, qword ptr [rcx]
        mov     byte ptr [rax + 2], 3
        mov     rax, qword ptr [rcx]
        mov     byte ptr [rax + 3], 4
        ret

Why has clang generated code that repeatedly dereferences the out parameter into the rax register? This seems like a really obvious optimization that it is deliberately not making, so the question is why?

Interestingly, I've tried changing uint8_t to uint16_t and this much better machine code is generated as a result:

        mov     rax, qword ptr [rcx]
        movabs  rcx, 1125912791875585
        mov     qword ptr [rax], rcx
        ret
like image 688
Alexander Rafferty Avatar asked Oct 14 '20 05:10

Alexander Rafferty


1 Answers

The compiler cannot do such optimization simply due to strict aliasing as uint8_t is always* defined as unsigned char. Therefore it can point to any memory location, which means it can also point to itself and because you pass it as a reference, the writes can have side-effects inside the function.

Here is obscure, yet correct, usage dependent on non-cached reads:

#include <cassert>
#include <stdint.h>
void Foo(uint8_t*& out)
{
    uint8_t local;
    // CANNOT be used as a cached value further down in the code.
    uint8_t* tmp = out;
    // Recover the stored pointer.
    uint8_t **orig =reinterpret_cast<uint8_t**>(out);
    // CHANGES `out` itself;
    *orig=&local;

    **orig=5;
    assert(local==5);
    // IS NOT EQUAL even though we did not touch `out` at all;
    assert(tmp!=out);
    assert(out==&local);
    assert(*out==5);
}

int main(){
   // True type of the stored ptr is uint8_t**
   uint8_t* ptr = reinterpret_cast<uint8_t*>(&ptr);

   Foo(ptr);
}

This also explains why uint16_t generates "optimized" code because uin16_t can never* be (unsigned) char so the compiler is free to assume that it does not alias other pointer types such as itself.

*Maybe some irrelevant obscure platforms with differently-sized bytes. That is beside the point.

like image 84
Quimby Avatar answered Sep 28 '22 14:09

Quimby