Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does `isync` prevent Store-Load reordering on CPU PowerPC?

As known, PowerPC has weak memory model, that permit any speculative reordering: Store-Store, Load-Store, Store-Load, Load-Load.

There are at least 3 Fences:

  • hwsync or sync - full memory barrier, prevents any reordering
  • lwsync - memory barriers that prevents reordering: Load-Load, Store-Store, Load-Store
  • isync - instruction barrier: https://www.ibm.com/support/knowledgecenter/en/ssw_aix_71/com.ibm.aix.alangref/idalangref_isync_ics_instrs.htm

For example, can be reordered Store-stwcx. and Load-lwz in this code?: https://godbolt.org/g/84t5jM

    lwarx 9,0,10
    addi 9,9,2
    stwcx. 9,0,10
    bne- 0,.L2
    isync
    lwz 9,8(1)

As known, isync prevents reordering lwarx,bne <--> any following instructions.

But does isync prevent reordering stwcx.,bne <--> any following instructions?

I.e. can Store-stwcx. begins earlier than the following Load-lwz, and finishes performed later than Load-lwz?

I.e. can Store-stwcx. preforms Store to the Store-Buffer earlier than the following Load-lwz begun, but the actual Store to the cache that visible for all CPU-cores occurs later than the Load-lwz finished?

As we see from the following documents, articles and books:

  • isync is not memory fence, but it is only instruction fence.

  • isync does not force all external accesses to complete with respect to other processors and mechanisms that access memory.

  • isync does not wait for all other processors to detect storage accesses

  • isync is a very low-overhead and very weak (lower than lwsync and hwsync)

  • isync does not guarantee that previous and future stores will be perceived by other processors in the locally issued order - that requires one of the sync instructions.

  • isync is acquire barrier, but as we known, acquire can be applied only to Load-operations, not for Store (stwcx.)

  • isync does not affect data accesses and does not wait for all stores to be performed.

The main question, initially: a=0, b=0

  • if CPU-Core-0 do: stwcx. [a]=1 bne- isync lwz [b].
  • And CPU-Core-1 do: hwsync stw [b]=1 hwsync lwz [a] hwsync.

Then can Core-0 see [b]==1 and Core-1 see [a]==0?


Also:

  1. https://www.ibm.com/developerworks/systems/articles/powerpc.html

The isync prevents speculative execution from accessing the data block before the flag has been set. And in conjunction with the preceding load, compare, and conditional branch instructions, the isync guarantees that the load on which the branch depends (the load of the flag) is performed prior to any loads that occur subsequent to the isync (loads from the shared block). isync is not a memory barrier instruction, but the load-compare-conditional branch-isync sequence can provide this ordering property.

  1. http://www.nxp.com/assets/documents/data/en/application-notes/AN2540.pdf

Unlike isync, sync forces all external accesses to complete with respect to other processors and mechanisms that access memory.

  1. Storage in the PowerPC Janice M. Stone, Robert P. Fitzgerald, 1995: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.47.4033&rep=rep1&type=pdf

Unlike sync , isync does not wait for all other processors to detect storage accesses. isync is a less conservative fence than sync because it does not delay until all processors detect previous loads and stores.

  1. http://open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2745.html

bc;isync: this is a very low-overhead and very weak form of memory fence. A specific set of preceding loads on which the bc (branch conditional) instruction depends are guaranteed to have completed before any subsequent instruction begins execution. However, store-buffer and cache-state effects can nevertheless make it appear that subsequent loads occur before the preceding loads upon which the twi instruction depends. That said, the PowerPC architecture does not permit stores to be executed speculatively, so any store following the twi;isync instruction is guaranteed to happen after any of the loads on which the bc depends.

  1. https://books.google.ru/books?id=TKOfDQAAQBAJ&pg=PA264&lpg=PA264&dq=isync+store+load&source=bl&ots=-4FyWvxTwg&sig=r1fitaG-Q3GHOxvSMTgLJMBVGUU&hl=ru&sa=X&ved=0ahUKEwiKjYK97urTAhUJ_iwKHbfMA58Q6AEIOjAC#v=onepage&q=isync%20store%20load&f=false

enter image description here

  1. https://books.google.ru/books?id=gZZgAQAAQBAJ&pg=PA71&lpg=PA71&dq=isync+store+load&source=bl&ots=bo6nTLdzEZ&sig=vCjoDmUWhn0buN_uMf8XgbDzCf4&hl=ru&sa=X&ved=0ahUKEwiKjYK97urTAhUJ_iwKHbfMA58Q6AEIcTAJ#v=onepage&q=isync%20store%20load&f=false

enter image description here

  1. https://books.google.ru/books?id=G2fmCgAAQBAJ&pg=PA321&lpg=PA321&dq=isync+store+load&source=bl&ots=YS4mE-4f_F&sig=OVwaJYE-SNnor-KtKrjlkOd6AOs&hl=ru&sa=X&ved=0ahUKEwiKjYK97urTAhUJ_iwKHbfMA58Q6AEIYjAH#v=onepage&q&f=false

enter image description here

  1. http://www.nxp.com/assets/documents/data/en/application-notes/AN3441.pdf

Note that isync does not affect data accesses and does not wait for all stores to be performed.

  1. Page 77: https://www.setphaserstostun.org/power8/POWER8_UM_v1.3_16MAR2016_pub.pdf

3.5.7.2 Instruction Cache Block Invalidate (icbi)

As a result of this and other implementation-specific design optimizations, instead of requiring the instruction sequence specified by the Power ISA to be executed on a per cache-line basis, software must only execute a single sequence of three instructions to make any previous code modifications become visible: sync, icbi (to any address), isync.


ANSWER:

So, isync doesn't guarantee Store-Load order, because "isync is not a memory barrier instruction", then isync doesn't guarantee that any previous stores will be visible to other CPU-Cores (uses sequential-consistency) before next intruction will be finished. Instruction synchronization command isync guarantees only the order of starting instructions, but does not guarantee the order of completion of instructions, i.e. does not guarantee the order of their visible effect to other CPU-Cores. Those, isync allows to reorder visible effect of Store-Load in this code stwcx. [a]=1; bne-; isync; lwz [b].

like image 805
Alex Avatar asked May 12 '17 18:05

Alex


1 Answers

As you have guessed and most of your excellent sources imply, there are two properties of a memory access involved here:

Visibility

If other processors can obverse the memory access.
The use of processor-specific buffers or caches can make a store complete on a processor yet make it not visible to other ones.

Ordering

When the memory access is executed with respected to other instructions on the same processor.


Ordering is an intra-processor aspect of a memory access, it controls the out-of-order capability of a processor.
Ordering cannot be done with respect to other processors' instructions.

Visibility is an inter-processor aspect, it ensures that the side effects of a memory access are visible to other processors (or in general, to other agents).
A store primary side effect is changing a memory location.

By controlling both aspects it is possible to enforce a inter-process Ordering, that is, the order in which other processors see a sequence of memory accesses.
It goes untold that the word "ordering" usually refers to this second meaning unless used in a context where no other agents are present.
It is admittedly a confusing terminology.


Beware that I'm not confident with the PowerPC architecture, I'm just applying the theory with the help of a few official documents found online and the quotes you provided.

isync, just like sc and rfi are Context-Synchronizing instructions, their main purpose is to guarantee that subsequent instructions execute in the context established by the previous ones. For example, executing a system call changes the context and we don't want the privileged code to execute in an unprivileged context and vice versa.

These instructions wait for all previously dispatched instructions to be completed but not to be visible

All previously issued instructions have completed, at least to a point where they can no longer cause an exception.
However, memory accesses that these instructions cause need not have completed with respect to other processors and mechanisms.

So, depending on what you mean by reordering, isync does or does not prevent Load-Load, Load-Store etc. reordering.
It does prevent any of such reordering from the perspective of the processor it is executed on (intra-process reordering) - all previous loads and stores are completed before isync complete but they are not necessarily visible.
It does not prevent reordering from the perspective of other processors (inter-process reordering) as it doesn't ensure the visibility of previous instructions.


But does isync prevent reordering stwcx.,bne <--> any following instructions?

Only intra-process reordering.

I.e. can Store-stwcx. begins earlier than the following Load-lwz, and finishes performed later than Load-lwz?

Not from the point-of-view of the processor executing them, stwcx. is completed by the time lwz begins but, using Intel terminology, it is completed locally - other processors may not see it completed by the time lwz begins.

I.e. can Store-stwcx. preforms Store to the Store-Buffer earlier than the following Load-lwz begun, but the actual Store to the cache that visible for all CPU-cores occurs later than the Load-lwz finished?

Yes, exactly.

like image 104
Margaret Bloom Avatar answered Sep 22 '22 03:09

Margaret Bloom