I've been reading some material on superscalr and OoO and I am confused.
I think their architecture graphs look very much the same.
An in-order implementation would just stall until the data becomes available, whereas an out of order implementation can (provided there are instructions ahead that can't be executed independently) get something else done while the processor waits for the data to be delivered from memory.
Many superscalar processors, however, are not out-of-order. In the example above, an in-order superscalar would execute instruction 1 first. It would not start instruction 3, but would wait until instruction 2 could start - at which time it would start instruction 2 and 3 together.
Out-of-order execution (OoOE) is an approach to processing that allows instructions for high-performance microprocessors to begin execution as soon as their operands are ready. Although instructions are issued in-order, they can proceed out-of- order with respect to each other.
The benefit of OoOE processing grows as the instruction pipeline deepens and the speed difference between main memory (or cache memory) and the processor widens.
Superscalar microprocessors can execute two or more instructions at the same time. E.g. typically they have at least 2 ALUs (although a superscalar processor might have 1 ALU and some other execution unit, like a shifter or jump unit.)
More precisely, superscalar processors can start executing two or more instructions in the same cycle. Pipelined processors can execute more than one instruction at a time, but a non-superscalar pipelined processor will only start a single instruction in any given cycle. Pipelined execution units take multiple cycles to execute end to end. Put another way: superscalar processors are usually capable of executing two non-pipelined instructions with single cycle latency per cycle, whereas non-superscalar pipelined processors cannot have two single cycle instructions in execution in the ALUs at the same time.
Out-of-order processors can execute instructions out of the original order. For example, in the following, where MULTIPLY takes 5 cycles, instruction 3 may execute before instruction 2 - because instruction 2 is waiting for the 5 cycle result of the MULTIPLY of instruction 1:
1: MULTIPLY reg1 := reg2 * reg3
2: ADD reg4 := reg1 + 5
3: ADD reg6 := reg2 + 1
Most out-of-order processors are also superscalar. However you can imagine building an out-of-order processor that is not superscalar, that can only initiate one operation on a pipelined ALU per cycle. (I have proposed such operations, when employed by Intel, as low power chips. Heck, you can build out-of-order processors that are only half-way scalar, e.g. that have only a 16 bit wide ALU, taking 2 cycles for a 32 bit add, etc. But that's stretching.)
Many superscalar processors, however, are not out-of-order. In the example above, an in-order superscalar would execute instruction 1 first. It would not start instruction 3, but would wait until instruction 2 could start - at which time it would start instruction 2 and 3 together.
Sometimes you have to think about unlikely limit cases, such as 1-wide or half-wide OOO machines, to understand the concepts.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With