I have been performing performance optimisations on some code at work, and stumbled upon some strange behaviour, which I've boiled down to the simple snippet of C++ code below:
#include <stdint.h>
void Foo(uint8_t*& out)
{
out[0] = 1;
out[1] = 2;
out[2] = 3;
out[3] = 4;
}
I then compile it with clang (on Windows) with the following: clang -S -O3 -masm=intel test.cpp
. This results in the following assembly:
mov rax, qword ptr [rcx]
mov byte ptr [rax], 1
mov rax, qword ptr [rcx]
mov byte ptr [rax + 1], 2
mov rax, qword ptr [rcx]
mov byte ptr [rax + 2], 3
mov rax, qword ptr [rcx]
mov byte ptr [rax + 3], 4
ret
Why has clang generated code that repeatedly dereferences the out
parameter into the rax
register? This seems like a really obvious optimization that it is deliberately not making, so the question is why?
Interestingly, I've tried changing uint8_t
to uint16_t
and this much better machine code is generated as a result:
mov rax, qword ptr [rcx]
movabs rcx, 1125912791875585
mov qword ptr [rax], rcx
ret
The compiler cannot do such optimization simply due to strict aliasing as uint8_t
is always* defined as unsigned char
. Therefore it can point to any memory location, which means it can also point to itself and because you pass it as a reference, the writes can have side-effects inside the function.
Here is obscure, yet correct, usage dependent on non-cached reads:
#include <cassert>
#include <stdint.h>
void Foo(uint8_t*& out)
{
uint8_t local;
// CANNOT be used as a cached value further down in the code.
uint8_t* tmp = out;
// Recover the stored pointer.
uint8_t **orig =reinterpret_cast<uint8_t**>(out);
// CHANGES `out` itself;
*orig=&local;
**orig=5;
assert(local==5);
// IS NOT EQUAL even though we did not touch `out` at all;
assert(tmp!=out);
assert(out==&local);
assert(*out==5);
}
int main(){
// True type of the stored ptr is uint8_t**
uint8_t* ptr = reinterpret_cast<uint8_t*>(&ptr);
Foo(ptr);
}
This also explains why uint16_t
generates "optimized" code because uin16_t
can never* be (unsigned) char
so the compiler is free to assume that it does not alias other pointer types such as itself.
*Maybe some irrelevant obscure platforms with differently-sized bytes. That is beside the point.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With