What is general difference between Superscalar and out-of-order (OoO) execution?

1 Answers

Superscalar microprocessors can execute two or more instructions at the same time. E.g. typically they have at least 2 ALUs (although a superscalar processor might have 1 ALU and some other execution unit, like a shifter or jump unit.)

More precisely, superscalar processors can start executing two or more instructions in the same cycle. Pipelined processors can execute more than one instruction at a time, but a non-superscalar pipelined processor will only start a single instruction in any given cycle. Pipelined execution units take multiple cycles to execute end to end. Put another way: superscalar processors are usually capable of executing two non-pipelined instructions with single cycle latency per cycle, whereas non-superscalar pipelined processors cannot have two single cycle instructions in execution in the ALUs at the same time.

Out-of-order processors can execute instructions out of the original order. For example, in the following, where MULTIPLY takes 5 cycles, instruction 3 may execute before instruction 2 - because instruction 2 is waiting for the 5 cycle result of the MULTIPLY of instruction 1:

1: MULTIPLY reg1 := reg2 * reg3
2: ADD reg4 := reg1 + 5
3: ADD reg6 := reg2 + 1

Most out-of-order processors are also superscalar. However you can imagine building an out-of-order processor that is not superscalar, that can only initiate one operation on a pipelined ALU per cycle. (I have proposed such operations, when employed by Intel, as low power chips. Heck, you can build out-of-order processors that are only half-way scalar, e.g. that have only a 16 bit wide ALU, taking 2 cycles for a 32 bit add, etc. But that's stretching.)

Many superscalar processors, however, are not out-of-order. In the example above, an in-order superscalar would execute instruction 1 first. It would not start instruction 3, but would wait until instruction 2 could start - at which time it would start instruction 2 and 3 together.

Sometimes you have to think about unlikely limit cases, such as 1-wide or half-wide OOO machines, to understand the concepts.

195

answered Oct 25 '22 01:10

Krazy Glew

Related questions
                            
                                Why does this Java code not utilize all CPU cores?
                            
                                Will a CPU process have at least one thread?
                            
                                How is the BIOS ROM mapped into address space on PC?
                            
                                MultiCore CPUs, Multithreading and context switching?
                            
                                Choosing CPU architecture for LLVM/CLANG
                            
                                GCC's reordering of read/write instructions
                            
                                Theano CNN on CPU: AbstractConv2d Theano optimization failed
                            
                                When using htop command, do red values in the time+ column mean there's something wrong?
                            
                                AWS EC2: The number of cpu cores available on an instance
                            
                                Get CPU temperature in CMD/POWER Shell
                            
                                Will multi threading provide any performance boost?
                            
                                prefetching data at L1 and L2
                            
                                CPU Utilization high for sleeping processes
                            
                                How to compute the theoretical peak performance of CPU
                            
                                How to occupy 80% CPU consistently?
                            
                                Associativity gives us parallelizability. But what does commutativity give?
                            
                                How many SHA256 hashes can a modern computer compute?
                            
                                Timing the CPU time of a python program?
                            
                                Whose responsibility is it to throw an exception?; OS or process?
                            
                                What are traps?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is general difference between Superscalar and out-of-order (OoO) execution?

Tags:

cpu-architecture

cpu

cloudygoose

People also ask

1 Answers

Krazy Glew

Recent Activity

Donate For Us