Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

relaxed ordering as a signal

Let's say we have two thread. One that give a "go" and one that wait a go to produce something.

Is this code correct or can I have an "infinite loop" because of cache or something like that?

std::atomic_bool canGo{false};

void producer() {
    while(canGo.load(memory_order_relaxed) == false);
    produce_data();
}

void launcher() {
    canGo.store(true, memory_order_relaxed);
}

int main() {
    thread a{producer};
    thread b{launcher};
}

If this code is not correct, is there a way to flush / invalidate the cache in standard c++?

like image 990
Antoine Morrier Avatar asked Jul 05 '19 16:07

Antoine Morrier


People also ask

What is relaxed ordering?

Relaxed ordering is a PCIe feature which allows flexibility in the transaction order over the PCIe. This reduces the number of retransmissions on the lane, and increases performance up to 4 times.

What is PCIe ordering?

PCI Express transaction ordering for native devices can be summarized with four simple rules: PCI Express requires strong ordering of transactions (i.e., performing transactions in the order issued by software) flowing through the fabric that have the same TC assignment (see item 4 for the exception to this rule).

What is ID based ordering in PCIe?

IDO (ID-based Ordering) enables the preservation of the producer consumer programming model and helps prevent deadlocks in PCIe-based systems (potentially including bridges to PCI/PCI-X).

What is PCI Express no snoop?

The PCI express protocol includes a "no snoop required" attribute in the transaction descriptor. For a PCIe non-snooped read, the request can go directly to the DRAM controller to obtain the data.


1 Answers

A go signal like this will usually be in response to some memory changes that you'll want the target to see.

In other words, you'll usually want to give release/acquire semantics to such signaling.

That can be done either by using memory_order_release on the store and memory_order_acquire on the load, or by putting a release fence before the relaxed store and and an acquire fence after the relaxed load so that memory operations done by the signaller before the store are visible to the signallee (see for example, https://preshing.com/20120913/acquire-and-release-semantics/ or the C/C++ standard).


The way I remember the ordering of the fences is that, as far as I understand, shared memory operations among cores are effectively hardware implemented buffered IO that follows a protocol, and a release fence should sort of be like an output buffer flush and an acquire fence like an input buffer flush/sync.

Now if you flush your core's memory op output buffer before issuing a relaxed store, then when the target core sees the relaxed store, the preceding memory op messages must be available to it and all it needs to see those memory changes in its memory is to sync them in with an acquire fence after it sees the signalling store.

like image 80
PSkocik Avatar answered Sep 27 '22 17:09

PSkocik