As known, PowerPC has weak memory model, that permit any speculative reordering: Store-Store, Load-Store, Store-Load, Load-Load.
There are at least 3 Fences:
hwsync
or sync
- full memory barrier, prevents any reorderinglwsync
- memory barriers that prevents reordering: Load-Load, Store-Store, Load-Storeisync
- instruction barrier: https://www.ibm.com/support/knowledgecenter/en/ssw_aix_71/com.ibm.aix.alangref/idalangref_isync_ics_instrs.htm
For example, can be reordered Store-stwcx.
and Load-lwz
in this code?: https://godbolt.org/g/84t5jM
lwarx 9,0,10
addi 9,9,2
stwcx. 9,0,10
bne- 0,.L2
isync
lwz 9,8(1)
As known, isync
prevents reordering lwarx
,bne
<--> any following instructions
.
But does isync
prevent reordering stwcx.
,bne
<--> any following instructions
?
I.e. can Store-stwcx.
begins earlier than the following Load-lwz
, and finishes performed later than Load-lwz
?
I.e. can Store-stwcx.
preforms Store to the Store-Buffer earlier than the following Load-lwz
begun, but the actual Store to the cache that visible for all CPU-cores occurs later than the Load-lwz
finished?
As we see from the following documents, articles and books:
isync
is not memory fence, but it is only instruction fence.
isync
does not force all external accesses to complete with respect to other processors and mechanisms that access memory.
isync
does not wait for all other processors to detect storage accesses
isync
is a very low-overhead and very weak (lower than lwsync
and hwsync
)
isync
does not guarantee that previous and future stores will be perceived by other processors in the locally issued order - that requires one of the sync instructions.
isync
is acquire barrier, but as we known, acquire can be applied only to Load-operations, not for Store (stwcx.
)
isync
does not affect data accesses and does not wait for all stores to be performed.
The main question, initially: a=0, b=0
stwcx. [a]=1
bne-
isync
lwz [b]
. hwsync
stw [b]=1
hwsync
lwz [a]
hwsync
. Then can Core-0 see [b]==1
and Core-1 see [a]==0
?
Also:
The isync prevents speculative execution from accessing the data block before the flag has been set. And in conjunction with the preceding load, compare, and conditional branch instructions, the isync guarantees that the load on which the branch depends (the load of the flag) is performed prior to any loads that occur subsequent to the isync (loads from the shared block). isync is not a memory barrier instruction, but the load-compare-conditional branch-isync sequence can provide this ordering property.
Unlike isync, sync forces all external accesses to complete with respect to other processors and mechanisms that access memory.
Unlike sync , isync does not wait for all other processors to detect storage accesses. isync is a less conservative fence than sync because it does not delay until all processors detect previous loads and stores.
bc;isync: this is a very low-overhead and very weak form of memory fence. A specific set of preceding loads on which the bc (branch conditional) instruction depends are guaranteed to have completed before any subsequent instruction begins execution. However, store-buffer and cache-state effects can nevertheless make it appear that subsequent loads occur before the preceding loads upon which the twi instruction depends. That said, the PowerPC architecture does not permit stores to be executed speculatively, so any store following the twi;isync instruction is guaranteed to happen after any of the loads on which the bc depends.
Note that isync does not affect data accesses and does not wait for all stores to be performed.
3.5.7.2 Instruction Cache Block Invalidate (icbi)
As a result of this and other implementation-specific design optimizations, instead of requiring the instruction sequence specified by the Power ISA to be executed on a per cache-line basis, software must only execute a single sequence of three instructions to make any previous code modifications become visible:
sync
,icbi
(to any address),isync
.
ANSWER:
So, isync
doesn't guarantee Store-Load order, because "isync is not a memory barrier instruction", then isync
doesn't guarantee that any previous stores will be visible to other CPU-Cores (uses sequential-consistency) before next intruction will be finished. Instruction synchronization command isync
guarantees only the order of starting instructions, but does not guarantee the order of completion of instructions, i.e. does not guarantee the order of their visible effect to other CPU-Cores. Those, isync
allows to reorder visible effect of Store-Load in this code stwcx. [a]=1; bne-; isync; lwz [b]
.
As you have guessed and most of your excellent sources imply, there are two properties of a memory access involved here:
If other processors can obverse the memory access.
The use of processor-specific buffers or caches can make a store complete on a processor yet make it not visible to other ones.
When the memory access is executed with respected to other instructions on the same processor.
Ordering is an intra-processor aspect of a memory access, it controls the out-of-order capability of a processor.
Ordering cannot be done with respect to other processors' instructions.
Visibility is an inter-processor aspect, it ensures that the side effects of a memory access are visible to other processors (or in general, to other agents).
A store primary side effect is changing a memory location.
By controlling both aspects it is possible to enforce a inter-process Ordering, that is, the order in which other processors see a sequence of memory accesses.
It goes untold that the word "ordering" usually refers to this second meaning unless used in a context where no other agents are present.
It is admittedly a confusing terminology.
Beware that I'm not confident with the PowerPC architecture, I'm just applying the theory with the help of a few official documents found online and the quotes you provided.
isync
, just like sc
and rfi
are Context-Synchronizing instructions, their main purpose is to guarantee that subsequent instructions execute in the context established by the previous ones.
For example, executing a system call changes the context and we don't want the privileged code to execute in an unprivileged context and vice versa.
These instructions wait for all previously dispatched instructions to be completed but not to be visible
All previously issued instructions have completed, at least to a point where they can no longer cause an exception.
However, memory accesses that these instructions cause need not have completed with respect to other processors and mechanisms.
So, depending on what you mean by reordering, isync
does or does not prevent Load-Load, Load-Store etc. reordering.
It does prevent any of such reordering from the perspective of the processor it is executed on (intra-process reordering) - all previous loads and stores are completed before isync
complete but they are not necessarily visible.
It does not prevent reordering from the perspective of other processors (inter-process reordering) as it doesn't ensure the visibility of previous instructions.
But does isync prevent reordering stwcx.,bne <--> any following instructions?
Only intra-process reordering.
I.e. can Store-stwcx. begins earlier than the following Load-lwz, and finishes performed later than Load-lwz?
Not from the point-of-view of the processor executing them, stwcx.
is completed by the time lwz
begins but, using Intel terminology, it is completed locally - other processors may not see it completed by the time lwz
begins.
I.e. can Store-stwcx. preforms Store to the Store-Buffer earlier than the following Load-lwz begun, but the actual Store to the cache that visible for all CPU-cores occurs later than the Load-lwz finished?
Yes, exactly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With