Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Out-of-order execution and reordering: can I see what after barrier before the barrier?

According to wikipedia: A memory barrier, also known as a membar, memory fence or fence instruction, is a type of barrier instruction that causes a central processing unit (CPU) or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction. This typically means that operations issued prior to the barrier are guaranteed to be performed before operations issued after the barrier.

Usually, articles talking about something like (I will use monitors instead of membars):

class ReadWriteExample {                                          
    int A = 0;    
    int Another = 0;                                                

    //thread1 runs this method                                    
    void writer () {                                              
      lock monitor1;   //a new value will be stored            
      A = 10;          //stores 10 to memory location A        
      unlock monitor1; //a new value is ready for reader to read
      Another = 20; //@see my question  
    }                                                             

    //thread2 runs this method                                    
    void reader () {                                              
      lock monitor1;  //a new value will be read               
      assert A == 10; //loads from memory location A
      print Another //@see my question           
      unlock monitor1;//a new value was just read              
   }                                                              
}       

But I wonder is it possible that compiler or cpu will shuffle the things around in a such way that code will print 20? I don't need guarantee.

I.e. by definition operations issued prior to barrier can't be pushed down by compiler, but is it possible that operations issued after barrier would be occasionally seen before barrier? (just a probability)

Thanks

like image 644
Vadim Kirilchuk Avatar asked Apr 08 '15 16:04

Vadim Kirilchuk


3 Answers

My answer below only addresses Java's memory model. The answer really can't be made for all languages as each may define the rules differently.

But I wonder is it possible that compiler or cpu will shuffle the things around in a such way that code will print 20? I don't need guarantee.

Your answer seems to be "Is it possible for the store of A = 20, be re-ordered above the unlock monitor?"

The answer is yes, it can be. If you look at the JSR 166 Cookbook, the first grid shown explains how re-orderings work.

In your writer case the first operation would be MonitorExit the second operation would be NormalStore. The grid explains, yes this sequence is permitted to be re-ordered.

This is known as Roach Motel ordering, that is, memory accesses can be moved into a synchronized block but cannot be moved out


What about another language? Well, this question is too broad to answer all questions as each may define the rules differently. If this is the case you would need to refine your question.

like image 74
John Vint Avatar answered Sep 25 '22 23:09

John Vint


In Java there is the concept of happens-before. You can read all the details about it on in the Java Specification. A Java compiler or runtime engine can re-order code but it must abide by the happens-before rules. These rules are important for a Java developer that wants to have detailed control on how their code is re-ordered. I myself have been burnt by re-ordering code, turns out I was referencing the same object via two different variables and the runtime engine re-ordered my code not realizing that the operations were on the same object. If I had either a happens-before (between the two operations) or used the same variable, then the re-ordering would not have occurred.

Specifically:

It follows from the above definitions that:

An unlock on a monitor happens-before every subsequent lock on that monitor.

A write to a volatile field (§8.3.1.4) happens-before every subsequent read of that field.

A call to start() on a thread happens-before any actions in the started thread.

All actions in a thread happen-before any other thread successfully returns from a join() on that thread.

The default initialization of any object happens-before any other actions (other than default-writes) of a program.

like image 28
Jose Martinez Avatar answered Sep 26 '22 23:09

Jose Martinez


Short answer - yes. This is very compiler and CPU architecture dependent. You have here the definition of a Race Condition. The scheduling Quantum won't end mid-instruction (can't have two writes to same location). However - the quantum could end between instructions - plus how they are executed out-of-order in the pipeline is architecture dependent (outside of the monitor block).

Now comes the "it depends" complications. The CPU guarantees little (see race condition). You might also look at NUMA (ccNUMA) - it is a method to scale CPU & Memory access by grouping CPUs (Nodes) with local RAM and a group owner - plus a special bus between Nodes.

The monitor doesn't prevent the other thread from running. It only prevents it from entering the code between the monitors. Therefore when the Writer exits the monitor-section it is free to execute the next statement - regardless of the other thread being inside the monitor. Monitors are gates that block access. Also - the quantum could interrupt the second thread after the A== statement - allowing Another to change value. Again - the quantum won't interrupt mid-instruction. Always think of threads executing in perfect parallel.

How do you apply this? I'm a bit out of date (sorry, C#/Java these days) with current Intel processors - and how their Pipelines work (hyperthreading etc). Years ago I worked with a processor called MIPS - and it had (through compiler instruction ordering) the ability to execute instructions that occurred serially AFTER a Branch instruction (Delay Slot). On this CPU/Compiler combination - YES - what you describe could happen. If Intel offers the same - then yes - it could happen. Esp with the NUMA (both Intel & AMD have this, I'm most familiar with AMD implementation).

My point - if threads were running across NUMA nodes - and access was to the common memory location then it could occur. Of course the OS tries hard to schedule operations within the same node.

You might be able to simulate this. I know C++ on MS allows access to NUMA technology (I've played with it). See if you can allocate memory across two nodes (placing A on one, and Another on the other). Schedule the threads to run on specific Nodes.

What happens in this model is that there are two pathways to RAM. I suppose this isn't what you had in mind - probably only a single path/Node model. In which case I go back to the MIPS model I described above.

I assumed a processor that interrupts - there are others that have a Yield model.

like image 32
ripvlan Avatar answered Sep 24 '22 23:09

ripvlan