Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory Model: preventing store-release and load-acquire reordering

It is known that, unlike Java's volatiles, .NET's ones allow reordering of volatile writes with the following volatile reads from another location. When it is a problem MemoryBarier is recommended to be placed between them, or Interlocked.Exchange can be used instead of volatile write.

It works but MemoryBarier could be a performance killer when used in highly optimized lock-free code.

I thought about it a bit and came with an idea. I want somebody to tell me if I took the right way.

So, the idea is the following:

We want to prevent reordering between these two accesses:

 volatile1 write

 volatile2 read

From .NET MM we know that :

 1) writes to a variable cannot be reordered with  a  following read from 
    the same variable
 2) no volatile accesses can be eliminated
 3) no memory accesses can be reordered with a previous volatile read 

To prevent unwanted reordering between write and read we introduce a dummy volatile read from the variable we've just written to:

 A) volatile1 write
 B) volatile1 read [to a visible (accessible | potentially shared) location]
 C) volatile2 read

In such case B cannot be reordered with A as they both access the same variable, C cannot be reordered with B because two volatile reads cannot be reordered with each other, and transitively C cannot be reordered with A.

And the question:

Am I right? Can that dummy volatile read be used as a lightweight memory barrier for such scenario?

like image 708
OmariO Avatar asked May 15 '13 18:05

OmariO


People also ask

What is memory_ order_ seq_ cst?

The default is std::memory_order_seq_cst which establishes a single total ordering over all atomic operations tagged with this tag: all threads see the same order of such atomic operations and no memory_order_seq_cst atomic operations can be reordered.

What is memory order in c++?

(since C++20) std::memory_order specifies how memory accesses, including regular, non-atomic memory accesses, are to be ordered around an atomic operation.

What is memory model in C ++ 11?

C++11 Memory Model. A memory model, a.k.a memory consistency model, is a specification of the allowed behavior of multithreaded programs executing with shared memory [1].

What is acquire and release semantics?

An operation has acquire semantics if other processors will always see its effect before any subsequent operation's effect. An operation has release semantics if other processors will see every preceding operation's effect before the effect of the operation itself.


2 Answers

Here I will use an arrow notation to conceptualize the memory barriers. I use an up arrow ↑ and a down arrow ↓ to represent volatile writes and reads respectively. Think of the arrow head as pushing away any other reads or writes. So no other memory access can move past the arrow head, but they can move past the tail.

Consider your first example. This is how it would be conceptualized.

↑          
volatile1 write  // A
volatile2 read   // B
↓

So clearly we can see that the read and the write are allowed to switch positions. You are correct.

Now consider your second example. You claimed that introducing a dummy read would prevent the write of A and the read of B from getting swapped.

↑          
volatile1 write  // A
volatile1 read   // A
↓
volatile2 read   // B
↓

We can see that B is prevented from floating up by the dummy read of A. We can also see that the read of A cannot float down because, by inference, that would be the same as B moving up before A. But, notice that we have no ↑ arrow that would prevent the write to A from floating down (remember it can still move past the tail of an arrow). So no, at least theoretically, injecting a dummy read of A will not prevent the original write of A and the read of B from getting swapped because the write to A is still allowed to move downward.

I had to really think about this scenario. One thing I pondered for a quite some time is whether the read and write to A are locked together in tandem. If so then that would prevent the write to A from moving down because it would have to take the read with it which we already said was prevented. So if you go with that school of thought then your solution might just work. But, I read the specification again and I see nothing special mentioned about volatile accesses to the same variable. Obviously, the thread has to execute in a manner that is logically consistent with the original program sequence (that is mentioned in the specification). But, I can visualize ways the compiler or hardware could optimize (or otherwise reorder) that tandem access of A and still get the same result. So, I simply have to side with caution here and assume that the write to A can move down. Remember, a volatile read does not mean "fresh read from main memory". The write to A could be cached in a register and then the read comes from that register delaying the actual write to a later time. Volatile semantics do not prevent that scenario as far as I know.

The correct solution would be to put a call to Thread.MemoryBarrier in between the accesses. You can see how this is conceptualized with the arrow notation.

↑          
volatile1 write       // A
↑
Thread.MemoryBarrier
↓
volatile2 read        // B
↓

Now you can see that the read is not allowed to float up and the write is not allowed to float down preventing the swap.


You can see some of my other memory barrier answers using this arrow notation here, here, and here just to name a few.

like image 96
Brian Gideon Avatar answered Oct 02 '22 15:10

Brian Gideon


I forgot to post the soon found answer back to SO. Better late than never..

Turns out it is impossible thanks to how processors (at least x86-x64 kind of them) optimize memory accesses. I found the answer when was reading Intel manuals on its procs. Example 8-5:" Intra-Processor Forwarding is Allowed" was looking suspicious. Googling for "store buffer forwarding" lead to Joe Duffy's blog posts (first and second - read them pls).

To optimize writes processor uses store buffers (per processor queues of write ops). Buffering writes locally allows it to do next optimization: satisfying reads from the previously buffered writes to the same memory location and which haven't left the processor yet. The technique is called store-buffer forwarding (or store-to-load forwarding).

The end result in our case is that as reading at B is satisfied from a local storage (store buffer) it is not considered a volatile read and can be reordered with further volatile reads from another memory location (C).

It seems like a violation of the rule "Volatile reads don't reorder with each other". Yes, it is a violation, but very rare and exotic one. Why did it happen? Probably because Intel's released its first formal document on memory model of its processors years after .NET (and its JIT compiler) saw the sunlight.

So the answer is: no, the dummy reading (B) doesn't prevent reordering between A and C and cannot be used as a lightweight memory barrier.

like image 42
OmariO Avatar answered Oct 02 '22 16:10

OmariO