Here we have a code of postbox code for data communication between two ARM cores (directly referred from the ARM Cortex A Series Programming Guide).
Core A:
STR R0, [Msg] @ write some new data into postbox
STR R1, [Flag] @ new data is ready to read
Core B:
Poll_loop:
LDR R1, [Flag]
CMP R1,#0 @ is the flag set yet?
BEQ Poll_loop
LDR R0, [Msg] @ read new data.
In order to enforce dependency, the document says that we need to insert not one, but two memory barriers, DMB, into the code.
Core A:
STR R0, [Msg] @ write some new data into postbox
DMB
STR R1, [Flag] @ new data is ready to read
Core B:
Poll_loop:
LDR R1, [Flag]
CMP R1,#0 @ is the flag set yet?
BEQ Poll_loop
DMB
LDR R0, [Msg] @ read new data.
I understand the first DMB in the Core A: it prevents compile reordering and also the memory access to [Msg] variable be observed by the system. Below is the definition of the DMB from the same document.
Data Memory Barrier (DMB)
This instruction ensures that all memory accesses in program order before the barrier are observed in the system before any explicit memory accesses that appear in program order after the barrier. It does not affect the ordering of any other instructions executing on the core, or of instruction fetches.
However, I am not sure why the DMB in the Core B is used. In the document it says:
Core B requires a DMB before the LDR R0, [Msg] to be sure that the message is not read until the flag is set.
If the DMB in the Core A makes the store to the [Msg] be observed to the system, then we should not need the DMB in the second core. My guess is, the compiler might do a reordering of reading [Flag] and [Msg] in the Core B (though I do not understand why it should do this since the read on [Msg] is dependent on [Flag]).
If this is the case, a compile barrier (asm volatile("" ::: "memory) instead of DMB should be enough. Do I miss something here?
Both barriers are necessary, and do need to be dmb
s - this is still about the hardware memory model, and nothing to do with compiler reordering.
Let's look at the writer on core A first:
STR R0, [Msg] @ write some new data into postbox
STR R1, [Flag] @ new data is ready to read
Since these are two independent stores to different addresses with no dependency between them, there is nothing to force core A to actually issue the stores in program order. The store to Msg
could, say, linger in a part-filled write buffer whilst the store to Flag
overtakes it and goes straight out to the memory system. Thus any observer other than core A could see the new value of Flag
, without yet seeing the new value of Msg
.
STR R0, [Msg] @ write some new data into postbox
DMB
STR R1, [Flag] @ new data is ready to read
Now, with the barrier, the store to Flag
is not permitted to be visible before the store to Msg
, because that would necessitate one or other store appearing to cross the barrier. Thus any external observer may either see both old values, the new Msg
but the old Flag
, or both new values. The previous case of seeing the new Flag
but the old Msg
can no longer occur.
OK, so the first barrier handles things getting written in the correct order, but there's also the matter of how they are read. Over on core B...
Poll_loop:
LDR R1, [Flag]
CMP R1,#0 @ is the flag set yet?
BEQ Poll_loop
LDR R0, [Msg] @ read new data.
Note that the branch to Poll_loop
does not form a control dependency between the two loads; if you consider program order, the load of Msg
is unconditional, and the value of Flag
does not affect whether it is executed or not, only whether execution ever progresses to that part of the program at all. Therefore the code could equivalently be written thus:
Poll_loop:
LDR R1, [Flag]
LDR R0, [Msg] @ read data, just in case.
CMP R1,#0 @ is the flag set yet?
BEQ Poll_loop @ no? OK, throw away that data and read everything again.
... @ do stuff with R0, because Flag was set so it must be good data, right?
Start to see the problem? Even with the original code, core B is free to speculatively load Msg
as soon as it reaches Poll_loop
, so even if the stores from core A become visible in program order, things could still play out like this:
core A | core B
-----------+-----------
| load Msg
store Msg |
store Flag |
| load Flag
| conclude that old Msg is valid
Thus you either need a barrier:
...
BEQ Poll_loop
DMB
LDR R0, [Msg] @ read new data.
or perhaps a fake address dependency:
...
BEQ Poll_loop
EOR R1, R1, R1
LDR R0, [Msg, R1] @ read new data.
To order the two loads against each other.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With