In Java, when we have two threads sharing the following variables:
int a;
volatile int b;
if thread 1 does:
a = 5;
b = 6;
Then a StoreStore barrier is inserted between these two instructions and 'a' is being flushed back to the main memory.
Now if thread 2 does:
if(b == 6)
a++;
a LoadLoad barrier is inserted between and we have a guarantee that if the new value of 'b' is visible then new value of 'a' is visible as well. But how actually this is achieved? Does LoadLoad invalidate the CPU caches/registers? Or just instructs a CPU to fetch the values of the variables that follow read from volatile again from CPU?
I have found this information about LoadLoad barrier (http://gee.cs.oswego.edu/dl/jmm/cookbook.html):
LoadLoad Barriers The sequence: Load1; LoadLoad; Load2 ensures that Load1's data are loaded before data accessed by Load2 and all subsequent load instructions are loaded. In general, explicit LoadLoad barriers are needed on processors that perform speculative loads and/or out-of-order processing in which waiting load instructions can bypass waiting stores. On processors that guarantee to always preserve load ordering, the barriers amount to no-ops.
but it does not really explain how this is achieved.
In computing, a memory barrier, also known as a membar, memory fence or fence instruction, is a type of barrier instruction that causes a central processing unit (CPU) or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction.
Memory barrier is implemented by the hardware processor. CPUs with different architectures have different memory barrier instructions. Therefore, the programmer needs to explicitly call memory barrier in the code to solve the preceding problem.
__sync_synchronize This function synchronizes data in all threads. A full memory barrier is created when this function is invoked.
A safety fence is intended to absorb some energy caused by hitting vehicles and to realign the vehicles to move parallel to the safety fence. A safety barrier is intended to provide containment instead of energy absorption upon hit by vehicles.
Memory barriers, or fences, are a set of processor instructions used to apply ordering limitations on memory operations. This article explains the impact memory barriers have on the determinism of multi-threaded programs.
I will give one example on how this is achieved. You can read more on the details here. For x86 processors as you indicated LoadLoad ends up being no-ops. In the article I linked Mark points out that
Doug lists the StoreStore, LoadLoad and LoadStore
So in essence the only barrier needed is a StoreLoad for x86 architectures. So how is this achieved on low level?
This is an excerpt from the blog:
Here's the code it generated for both volatile and non-volatile reads:
nop ;*synchronization entry
mov 0x10(%rsi),%rax ;*getfield x
And for volatile writes:
xchg %ax,%ax
movq $0xab,0x10(%rbx)
lock addl $0x0,(%rsp) ;*putfield x
The lock
instruction is the StoreLoad as listed by Doug's cookbook. But the lock instruction also synchronizes all reads with other processes as listed
Locked instructions can be used to synchronize data written by one processor and read by another processor.
This reduces the overhead of having to issue LoadLoad LoadStore barriers for volatile loads.
All that being said, I will reiterate what assylias noted. The way it happens should not be important to a developer (if you are interested in processor/compiler implementer that is another story). The volatile
keyword is kind of an interface saying
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With