I have learnd Processor Architecture 3 years ago.
Until today , I can't figure out why execute located before memory in the sequential instructions.
While executing the instruction [ mov (%eax) %ebx] , does it needn't to access memory?
Thanks!
Let's remember classic RISC pipeline, which is usually studied: http://en.wikipedia.org/wiki/Classic_RISC_pipeline. Here are its stages:
In RISC you can only have loads and stores to work with memory. And EX stage for memory access instruction will compute the address in memory (take address from register file, scale it or add offset). Then address will be passed to MEM stage.
Your example, mov (%eax), %ebx is actually a load from memory without any additional computation and it can be represented even in RISC pipeline:
IF - get the instruction from instruction memoryID - decode instruction, pass "eax" register to ALU as operand; remember "ebx" as output for WB (in control unit);EX - compute "eax+0" in ALU and pass result to next stage MEM (as address in memory)MEM - take address from EX stage (from ALU), go to memory and take value (this stage can take several ticks to reach memory with blocking of the pipeline). Pass value to WB
WB - take value from MEM and pass it back to register file. Control unit should set the register file into mode: "Writing"+"EBX selected"Situation is more complex in true CISC instruction, e.g. add (%eax), %ebx (load word T from [%eax] memory, then store T+%ebx to %ebx). This instruction needs both address computation and addition in ALU. This can't be easily represented in simplest RISC (MIPS) pipelines.
First x86 cpu (8086) was not pipelined, it executed only single instruction at any moment. But since 80386 there is pipeline with 6 stages, which is more complex than in RISC. There is presentation about its pipeline, comparing it with MIPS: http://www.academic.marist.edu/~jzbv/architecture/Projects/projects2004/INTEL%20X86%20PIPELINING.ppt
Slide 17 says:
mem and EX stages to avoid loads and stalls, but does create stalls for address computation In my example, add will be executed in that combined "MEM+EX" stage for several CPU ticks, generating many stalls.
Modern x86 CPUs have very long pipeline (16 stages is typical), and they are RISC-like cpus internally. Decoder stages (3 stage or more) will break most complex x86 instructions into series of internal RISC-like micro-operations (sometimes up to 450 microoperations per instruction are generated with help of microcode; more typical is 2-3 microoperations). For complex ALU/MEM operations, there will be microop for address computation, then microop for memory load and then microop for ALU action. Microoperations will have depends between them, and planned to different execution ports.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With