Well looks too simple a question to be asked but i asked after going through few ppts on both.
Both methods increase instruction throughput. And Superscaling almost always makes use of pipelining as well. Superscaling has more than one execution unit and so does pipelining or am I wrong here?
A microprocessor uses pipelining or superscalar technology is said to have pipeline or superscalar design. Pipelining allows the processor to read a new instruction from memory before it is finished processing the current one.
Pipelining Basically these ranks is given because of the reason that super scalar execute two instruction per clock cycle whereas super pipelining execute one and half instruction per clock cycle and pipelining execute one instruction per cycle.
Superscalar architecture is a method of parallel computing used in many processors. In a superscalar computer, the central processing unit (CPU) manages multiple instruction pipelines to execute several instructions concurrently during a clock cycle.
The super-pipelining is based on dividing the stages of a pipeline into several substages, and thus, it increases the number of instructions which are handled by the pipeline at the same time [12]. For example, by dividing each stage into two substages, a pipeline can perform at twice the speed in the ideal situation.
Superscalar design involves the processor being able to issue multiple instructions in a single clock, with redundant facilities to execute an instruction. We're talking about within a single core, mind you -- multicore processing is different.
Pipelining divides an instruction into steps, and since each step is executed in a different part of the processor, multiple instructions can be in different "phases" each clock.
They're almost always used together. This image from Wikipedia shows both concepts in use, as these concepts are best explained graphically:
Here, two instructions are being executed at a time in a five-stage pipeline.
To break it down further, given your recent edit:
In the example above, an instruction goes through 5 stages to be "performed". These are IF (instruction fetch), ID (instruction decode), EX (execute), MEM (update memory), WB (writeback to cache).
In a very simple processor design, every clock a different stage would be completed so we'd have:
Which would do one instruction in five clocks. If we then add a redundant execution unit and introduce superscalar design, we'd have this, for two instructions A and B:
Two instructions in five clocks -- a theoretical maximum gain of 100%.
Pipelining allows the parts to be executed simultaneously, so we would end up with something like (for ten instructions A through J):
In nine clocks, we've executed ten instructions -- you can see where pipelining really moves things along. And that is an explanation of the example graphic, not how it's actually implemented in the field (that's black magic).
The Wikipedia articles for Superscalar and Instruction pipeline are pretty good.
A long time ago, CPUs executed only one machine instruction at a time. Only when it was completely finished did the CPU fetch the next instruction from memory (or, later, the instruction cache).
Eventually, someone noticed that this meant that most of a CPU did nothing most of the time, since there were several execution subunits (such as the instruction decoder, the integer arithmetic unit, and FP arithmetic unit, etc.) and executing an instruction kept only one of them busy at a time.
Thus, "simple" pipelining was born: once one instruction was done decoding and went on towards the next execution subunit, why not already fetch and decode the next instruction? If you had 10 such "stages", then by having each stage process a different instruction you could theoretically increase the instruction throughput tenfold without increasing the CPU clock at all! Of course, this only works flawlessly when there are no conditional jumps in the code (this led to a lot of extra effort to handle conditional jumps specially).
Later, with Moore's law continuing to be correct for longer than expected, CPU makers found themselves with ever more transistors to make use of and thought "why have only one of each execution subunit?". Thus, superscalar CPUs with multiple execution subunits able to do the same thing in parallel were born, and CPU designs became much, much more complex to distribute instructions across these fully parallel units while ensuring the results were the same as if the instructions had been executed sequentially.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With