Let's say we have two thread. One that give a "go" and one that wait a go to produce something.
Is this code correct or can I have an "infinite loop" because of cache or something like that?
std::atomic_bool canGo{false};
void producer() {
while(canGo.load(memory_order_relaxed) == false);
produce_data();
}
void launcher() {
canGo.store(true, memory_order_relaxed);
}
int main() {
thread a{producer};
thread b{launcher};
}
If this code is not correct, is there a way to flush / invalidate the cache in standard c++?
Relaxed ordering is a PCIe feature which allows flexibility in the transaction order over the PCIe. This reduces the number of retransmissions on the lane, and increases performance up to 4 times.
PCI Express transaction ordering for native devices can be summarized with four simple rules: PCI Express requires strong ordering of transactions (i.e., performing transactions in the order issued by software) flowing through the fabric that have the same TC assignment (see item 4 for the exception to this rule).
IDO (ID-based Ordering) enables the preservation of the producer consumer programming model and helps prevent deadlocks in PCIe-based systems (potentially including bridges to PCI/PCI-X).
The PCI express protocol includes a "no snoop required" attribute in the transaction descriptor. For a PCIe non-snooped read, the request can go directly to the DRAM controller to obtain the data.
A go signal like this will usually be in response to some memory changes that you'll want the target to see.
In other words, you'll usually want to give release/acquire semantics to such signaling.
That can be done either by using memory_order_release
on the store and memory_order_acquire
on the load, or by putting a release fence before the relaxed store and and an acquire fence after the relaxed load so that memory operations done by the signaller before the store are visible to the signallee (see for example, https://preshing.com/20120913/acquire-and-release-semantics/ or the C/C++ standard).
The way I remember the ordering of the fences is that, as far as I understand, shared memory operations among cores are effectively hardware implemented buffered IO that follows a protocol, and a release fence should sort of be like an output buffer flush and an acquire fence like an input buffer flush/sync.
Now if you flush your core's memory op output buffer before issuing a relaxed store, then when the target core sees the relaxed store, the preceding memory op messages must be available to it and all it needs to see those memory changes in its memory is to sync them in with an acquire fence after it sees the signalling store.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With