Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

StoreLoad Memory Barrier

I can't understand a definition of StoreLoad barrier in the JSR-133 Coookbook.

Store1; StoreLoad; Load2

StoreLoad barriers protect against a subsequent load incorrectly using Store1's data value rather than that from a more recent store to the same location performed by a different processor.

Does it mean that without a StoreLoad barrier a processor can make a store Store1 to its write-buffer and load this stored value from its write-buffer even though some other processor made a write to the same memory location and flushed to cache between Store1 and Load1?

like image 427
Alex Avatar asked Jan 15 '14 21:01

Alex


1 Answers

Yes, it's possible, depending on the memory ordering model.

A write-buffer is usually pre-dispatch, meaning that the stores within are not yet observable to the outside world. However, for better performance, most micro-architectures allow younger loads on the same thread to execute and if the address matches that of the store - data forwarding can be performed to make the program continue as fast as possible while making the load appear as if it was done after the store.

This works fine for intra-thread coherence, but when external processors access the same address and possibly change the data, it may come too late for the load to see that (although on many CPUs that might still be caught if the load didn't complete yet, and the machine will repair itself).

I'm not entirely sure what the quote is meant to explain, but I think is can be better demonstrated with this scenario:

CPU0:                              CPU1: 
store [x]<--1                      
                                   store [x]<--2
                                   store [y]<--2
load  r1<--[x]                     
load  r2<--[y]                     

A possible outcome (in theory) without the barrier is r1 == 1, r2 == 2, meaning that both stores by CPU1 were already performed (since we read 2 from [y]), but somehow the old value of [x] has survived (because it got forwarded).

I don't really like this example, first of all because, as I said, most CPUs should successfully snoop out the old value of that load even after it was performed. Secondly - it's overly complicated because they insisted to claim:

a StoreLoad is strictly necessary only for separating stores from subsequent loads of the same location(s) as were stored before the barrier

That is wrong, a barrier is necessary also when the addresses differ, as can be seen in the following (classic) example:

CPU0:                              CPU1: 
store [x]<--1                      store [y]<--1
load  r1<--[y]                     load r2<--[x]

Here the addresses are different, and still a barrier is required to prevent a case where both loads read the old values (even though both stores had to be performed to get there), thanks to out-of-order execution of loads. Note that this is a different problem than the one presented (store to load forwarding), but it proves the quote is wrong.

like image 129
Leeor Avatar answered Oct 16 '22 03:10

Leeor