Consider the following pseudocode:
expected = null;
if (variable == expected)
{
atomic_compare_exchange_strong(
&variable, expected, desired(), memory_order_acq_rel, memory_order_acq);
}
return variable;
Observe there are no "acquire" semantics when the variable == expected
check is performed.
It seems to me that desired
will be called at least once in total, and at most once per thread.
Furthermore, if desired
never returns null
, then this code will never return null
.
Now, I have three questions:
Is the above necessarily true? i.e., can we really have well-ordered reads of shared variables even in the absence of fences on every read?
Is it possible to implement this in C++? If so, how? If not, why?
(Hopefully with a rationale, not just "because the standard says so".)
If the answer to (2) is yes, then is it also possible to implement this in C++ without requiring variable == expected
to perform an atomic read of variable
?
Basically, my goal is to understand if it is possible to perform lazy-initialization of a shared variable in a manner that has performance identical to that of a non-shared variable once the code has been executed at least once by each thread?
(This is somewhat of a "language-lawyer" question. So that implies the question isn't about whether this is a good or useful idea, but rather about whether it's technically possible to do this correctly.)
Transactions from unrelated threads are unlikely to have data dependencies. Consequently, you may be able to use relaxed ordering to improve system performance. The drawback is that only some transactions can be optimized for performance. Complete the following steps to decide whether to enable relaxed ordering in your design:
The usual locking synchronization mechanisms such as mutexes and semaphores are designed to take care of the memory reordering problem for you, both hardware and software wise. They are high level tools after all.
If relaxed ordering improves performance without introducing errors, you can enable it in your system. 9.3. Receive Buffer Reordering 10. Throughput Optimization
PCI Express Design Using Relaxed Ordering If your analysis indicates that you can enable relaxed ordering, simulate your system with and without relaxed ordering enabled. Compare the results and performance. If relaxed ordering improves performance without introducing errors, you can enable it in your system.
Regarding the question whether it is possible to perform lazy initialisation of a shared variable in C++, that has a performance (almost) identical to that of a non-shared variable:
The answer is, that it depends on the hardware architecture, and the implementation of the compiler and run-time environment. At least, it is possible in some environments. In particular on x86 with GCC and Clang.
On x86, atomic reads can be implemented without memory fences. Basically, an atomic read is identical to a non-atomic read. Take a look at the following compilation unit:
std::atomic<int> global_value;
int load_global_value() { return global_value.load(std::memory_order_seq_cst); }
Although I used an atomic operation with sequential consistency (the default), there is nothing special in the generated code. The assembler code generated by GCC and Clang looks as follows:
load_global_value():
movl global_value(%rip), %eax
retq
I said almost identical, because there are other reasons that might impact the performance. For example:
Having said that, the recommended way to implement lazy initialisation is to use std::call_once
. That should give you the best result for all compilers, environments and target architectures.
std::once_flag _init;
std::unique_ptr<gadget> _gadget;
auto get_gadget() -> gadget&
{
std::call_once(_init, [this] { _gadget.reset(new gadget{...}); });
return *_gadget;
}
This is undefined behavior. You're modifying variable
, at
least in some thread, which means that all accesses to
variable must be protected. In particular, when you're
executing the atomic_compare_exchange_strong
in one thread,
there is nothing to guarantee that another thread might see the
new value of variable
before it sees the writes that might
have occurred in desired()
. (atomic_compare_exchange_strong
only guarantees any ordering in the thread that executes it.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With