Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can memory store be reordered really, in an OoOE processor?

We know that two instructions can be reordered by an OoOE processor. For example, there are two global variables shared among different threads.

int data;
bool ready;

A writer thread produce data and turn on a flag ready to allow readers to consume that data.

data = 6;
ready = true;

Now, on an OoOE processor, these two instructions can be reordered (instruction fetch, execution). But what about the final commit/write-back of the results? i.e., will the store be in-order?

From what I learned, this totally depends on a processor's memory model. E.g., x86/64 has a strong memory model, and reorder of stores is disallowed. On the contrary, ARM typically has a weak model where store reordering can happen (along with several other reorderings).

Also, the gut feeling tells me that I am right because otherwise we won't need a store barrier between those two instructions as used in typical multi-threaded programs.

But, here is what our wikipedia says:

.. In the outline above, the OoOE processor avoids the stall that occurs in step (2) of the in-order processor when the instruction is not completely ready to be processed due to missing data.

OoOE processors fill these "slots" in time with other instructions that are ready, then re-order the results at the end to make it appear that the instructions were processed as normal.

I'm confused. Is it saying that the results have to be written back in-order? Really, in an OoOE processor, can store to data and ready be reordered?

like image 416
Eric Z Avatar asked Dec 04 '22 05:12

Eric Z


1 Answers

The simple answer is YES on some processor types.

Before the CPU, your code faces an earlier problem, compiler reordering.

data = 6;
ready = true;

The compiler is free to rearrange these statements since, as far as it knows, they do not affect each other (it is not thread-aware).

Now down to the processor level:

1) An out-of-order processor can process these instructions in different order, including reversing the order of the stores.

2) Even if the CPU performs them in order, they memory controller may not perform them in order because it may need to flush or bring in new cache lines or do an address translation before it can write them.

3) Even if this doesn't happen, another CPU in the system may not see them in the same order. In order to observe them, it may need to bring in the modified cache lines from the core that wrote them. It may not be able to bring one cache line in earlier than another if it is held be another core or if there is contention for that line by multiple cores, and its own out of order execution will read one before the other.

4) Finally, speculative execution on other cores may read the value of data before ready was set by the writing core, and by the time it gets around to reading ready, it was already set but data was also modified.

These problems are all solved by memory barriers. Platforms with weakly-ordered memory must make use of memory barriers to ensure memory coherence for thread synchronization.

like image 84
Variable Length Coder Avatar answered Dec 31 '22 14:12

Variable Length Coder