Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can out-of-order execution lead to speculative memory accesses?

When a out-of-order processor encounters something like

LOAD R1, 0x1337
LOAD R2, $R1
LOAD R3, 0x42

Assuming that all accesses will result in a cache miss, can the processor ask the memory controller for the contents of 0x42 before the it asks for the content of $R1 or even 0x1337? If so, assuming that accessing $R1 will result in a exception (e.g., segmentation fault), we can consider that 0x42 was loaded speculatively, correct?

And by the way, when a load-store unit sends a request to the memory controller, can it send a second request before receiving the answer to the previous one?

My question doesn't target any architecture in particular. Answers related to any mainstream architecture are welcomed.

like image 666
João Fernandes Avatar asked Dec 12 '22 22:12

João Fernandes


2 Answers

Answer to your question depends on the memory ordering model of your CPU, which is not the same as the CPU allowing out of order execution. If the CPU implements Total store ordering (eg x86 or Sparc) then the answer to your question is 0x42 will not be loaded before 0x1337

If the cpu implements a relaxed memory model (eg IA-64, PowerPC, alpha), then in the absence of a memory fence instruction all bets are off as to which will be accessed first. This should be of little relevance unless you are doing IO, or dealing with multi-threaded code.

you should note that some CPU's (eg Itanium) do have relaxed memory models (so reads may be out of order) but do NOT have any out of order execution logic since they expect the compiler to order the instructions and speculative instructions in an optimal way rather than spend silicon space on OOE

like image 148
camelccc Avatar answered Dec 31 '22 13:12

camelccc


This would seem to be the a logical conclusion for superscalor CPUs with multiple load-store units too. Multi-channel memory controllers are pretty common these days.

In the case of out-of-order instruction execution, an enormous amount of logic is expended in determining whether instructions have dependancies on others in the stream - not just register dependancies but also operations on memory as well. There's also an enormous amount of logic for handling exceptions: the CPU needs to complete all instructions in the stream up to the fault (or alternatively, offload some parts of this onto the operating system).

In terms of the programming model seen by most applications, the effects are never apparent. As seen by memory, it's implicit that loads will not always happen in the sequence expected - but this is the case any way when caches are in use.

Clearly, in circumstances where the order of loads and stores does matter - for instance in accessing device registers, OOE must be disabled. The POWER architecture has the wonderful EIEIO instruction for this purpose.

Some members of the ARM Cortex-A family offer OOE - I suspect with the power constraints of these devices, and the apparent lack of instructions for forcing ordering, that load-stores always complete in order

like image 35
marko Avatar answered Dec 31 '22 14:12

marko