Local static objects in C++ are initialized once, the first time they are needed (which is relevant if the initialization has a side effect):
void once() {
static bool b = [] {
std::cout << "hello" << std::endl; return true;
} ();
}
once
will print "hello" the first time it is called, but not if it is called again.
I've put a few variations of this pattern into Compiler Explorer and noticed that all of the big-name implementations (GCC, Clang, ICC, VS) essentially do the same thing: a hidden variable guard variable for once()::b
is created, and checked to see whether the primary variable needs to be initialized "this time"; if it does, it gets initialized and then the guard is set, and next time it won't jump out to the initialization code. e.g. (minimized by replacing the lambda with a call to extern bool init_b();
):
once():
movzx eax, BYTE PTR guard variable for once()::b[rip]
test al, al
je .L16
ret
.L16:
push rbx
mov edi, OFFSET FLAT:guard variable for once()::b
call __cxa_guard_acquire
test eax, eax
jne .L17
pop rbx
ret
.L17:
call init_b()
pop rbx
mov edi, OFFSET FLAT:guard variable for once()::b
jmp __cxa_guard_release
mov rbx, rax
mov edi, OFFSET FLAT:guard variable for once()::b
call __cxa_guard_abort
mov rdi, rbx
call _Unwind_Resume
...from GCC 6.3 with -O3.
This isn't unreasonable, and I know that in practice conditional jumps are close to free anyway when the condition is consistent. However, my gut feeling would still have been to implement this by unconditionally jumping to the initialization code, which as its last action overwrites the originating jump with nop
instructions. Not necessarily an option on every platform, but the x86 family seems quite liberal about what you can read or write, and where.
What's so wrong with this apparently-simple idea that no mainstream compiler uses it? (Or do I just need to try harder with my examples?)
This sort of "optimization" is not safe in a multithreaded environment, and may not be safe even in a single one.
The writing of "nops" could likely take multiple instructions.
The size of the jmp instruction may not be knowable until the final code is optimized (does it need an 8, 16, or 32 bit offset?)
Instruction caching within the CPU does not pick up on a change in code bytes unless one of a subset of instructions is executed to cause the caches to be flushed.
And all that is assuming the code can be written to via the data segment.
On most modern operating systems modifying the code loaded with the program causes issues. This can both cause performance issues (Unmodified code can share pages between many instances of a dll on some systems), and security issues (preventing the use of executable space protection technologies).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With