I have learnd Processor Architecture 3 years ago.
Until today , I can't figure out why execute
located before memory
in the sequential instructions.
While executing the instruction [ mov (%eax) %ebx]
, does it needn't to access memory?
Thanks!
Let's remember classic RISC pipeline, which is usually studied: http://en.wikipedia.org/wiki/Classic_RISC_pipeline. Here are its stages:
In RISC you can only have load
s and store
s to work with memory. And EX
stage for memory access instruction will compute the address in memory (take address from register file, scale it or add offset). Then address will be passed to MEM
stage.
Your example, mov (%eax), %ebx
is actually a load from memory without any additional computation and it can be represented even in RISC pipeline:
IF
- get the instruction from instruction memoryID
- decode instruction, pass "eax" register to ALU as operand; remember "ebx" as output for WB (in control unit);EX
- compute "eax+0" in ALU and pass result to next stage MEM
(as address in memory)MEM
- take address from EX
stage (from ALU), go to memory and take value (this stage can take several ticks to reach memory with blocking of the pipeline). Pass value to WB
WB
- take value from MEM
and pass it back to register file. Control unit should set the register file into mode: "Writing"+"EBX selected"Situation is more complex in true CISC instruction, e.g. add (%eax), %ebx
(load word T
from [%eax]
memory, then store T+%ebx
to %ebx
). This instruction needs both address computation and addition in ALU. This can't be easily represented in simplest RISC (MIPS) pipelines.
First x86 cpu (8086) was not pipelined, it executed only single instruction at any moment. But since 80386 there is pipeline with 6 stages, which is more complex than in RISC. There is presentation about its pipeline, comparing it with MIPS: http://www.academic.marist.edu/~jzbv/architecture/Projects/projects2004/INTEL%20X86%20PIPELINING.ppt
Slide 17 says:
mem
and EX
stages to avoid loads and stalls, but does create stalls for address computation In my example, add
will be executed in that combined "MEM+EX
" stage for several CPU ticks, generating many stalls.
Modern x86 CPUs have very long pipeline (16 stages is typical), and they are RISC-like cpus internally. Decoder stages (3 stage or more) will break most complex x86 instructions into series of internal RISC-like micro-operations (sometimes up to 450 microoperations per instruction are generated with help of microcode; more typical is 2-3 microoperations). For complex ALU/MEM operations, there will be microop for address computation, then microop for memory load and then microop for ALU action. Microoperations will have depends between them, and planned to different execution ports.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With