Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

out-of-order versus in-order execution in the context of code written in C\C++

Could anyone explain to me(in plain english) out-of-order versus in-order execution? I'm reading some theoretical texts on that and I feel that I can't quite grasp it. A small example in the context of C\C++ could be of help. What are the particularities in regards to multicore processors and multithreading?

like image 316
celavek Avatar asked Aug 05 '11 22:08

celavek


2 Answers

Out-of-order execution is a technique used by engineers who create microprocessors. The result affects the way the microprocessor executes machine instructions, which we usually write using "assembly language."

It's important to realize that out-of-order execution is not something the programmer implements -- it is a mechanism on the microprocessor. A programmer might write assembly code which makes clever use of a particular implementation, but the same code written on a later microprocessor might not benefit from that cleverness, as the processors' design as to what is executed out-of-order might differ.

With that preamble, here is an example of a potential out-of-order execution:

  • Suppose we have a microprocessor which can execute two instructions at the same time.
  • The instructions are accessing the same set of registers, so the ability to execute two instructions at the same time is neither multicore nor multithread.
  • If an instruction changes a register, it cannot be executed at the same time as an instruction which reads or writes to that register -- because the intermediate result is not available and the register would receive a wrong result.
  • Some example program contains the following x86 assembly instructions:

    1) mov eax, 0
    2) mov ebx, 1
    3) mov edx, 2
    4) inc edx
    5) mov ecx, 3
    

During the first time slot, instructions (1) and (2) execute together because (2) does not depend on the result of (1).

During the second time slot, the microprocessor determines that (3) and (4) cannot execute together -- (4) uses the value of edx, which will not be correctly set to 2 until instruction (3) completes.

The microprocessor can be built to handle this in a couple of ways:

  1. The processor can "stall" or "pipeline stall" and execute only a single instruction (3), at this time slot. Then instruction (4) will execute as part of the next time slot, probably simultaneous with (5).

  2. The processor may "out-of-order" execute an instruction rather than (4). In this example, instructions (3) and (5) may execute simultaneously, because (5) does not depend on the result of either (3) or (4), and (4) is not made incorrect by the execution of (5). Therefore, (5) may be executed out-of-order relative to (4).

It is worth realizing that the firmware decisions about out-of-order execution are made by transistors and microcode inside the microprocessor.

Other worthy related topics include superscalar dispatch, speculative execution, and exception boosting or hoisting.

like image 133
Heath Hunnicutt Avatar answered Oct 21 '22 13:10

Heath Hunnicutt


A program consists of a series of instructions in memory. The processor reads the instructions in order and executes them. To the user, they appear to execute in order. However, the processor may speed execution by reordering them in time. This helps because some instructions are slower than others, and some fast instructions may not require the results of preceding slow instructions.

Here is a snippet of C. It doesn't really illustrate much because the compiler is allowed to reorder operations before they get to the CPU, but we can assume for the sake of argument that it doesn't.

int can_reorder() {
    int a = 4, b = 3;
    int c = a + b; // fast instruction
    int d = a / b; // slow instruction
    return c + a; // fast instruction may complete before division
}

Multithreading is totally orthogonal. (Almost) no instruction in thread A depends on the result of an instruction in thread B, so the CPU is free to choose to execute whichever thread is more convenient in a given execution unit on a given cycle.

like image 22
Potatoswatter Avatar answered Oct 21 '22 13:10

Potatoswatter