So I'm trying to understand the Tomasulo algorithm for out-of-order instruction execution. Here's what I'm getting so far:
Instructions are fetched in-order and stored in an instruction queue.
Register renaming happens somewhere next...? From what I understand this is to avoid WAR/WAW hazards by giving labels to the registers. Say you have add r1,r2,r3 (1) add r3,r5,r6 (2) You have a WAR hazard and need to ensure that instruction (1) reads the old value of r3 before adding it to r1. So I guess within the instruction queue (?) the hardware renames the registers, i.e. add r1,r2,r3#1 add r3#2,r5,r6 Or something like that.
Instructions are issued to reservation stations. From what I understand, each functional unit has its own set of reservation stations. But is it like a queue (FIFO) of instructions for that functional unit to execute when the appropriately tagged operands are available on the common data bus?
Since instructions can finish in arbitrary order (out of order), and more instructions can keep coming... Is there like a stage where the common data bus updates the register file before more instructions come in? I've heard a re-order buffer is used, which basically sorts the instructions back in order (this must imply instructions have a tag of some sort) and then the register results are committed back into the register file.
What I'm confused about is the implementation of register renaming, and the structure of the reservation stations.
Thank you for any help.
Every modern OoO exec CPU uses Tomasulo's algorithm for register renaming. The basic idea of renaming onto more physical registers in a kind of SSA dependency analysis hasn't changed.
In Tomasulo's algorithm, the control logic is distributed among the reservation stations, whereas in scoreboarding, the scoreboard keeps track of everything.
We will discuss the Tomasulo's technique here. The key idea is to allow instructions that have operands available to execute, even if the earlier instructions have not yet executed. For example, consider the following sequence of instructions: DIV.D F0 , F2 , F4. ADD.D F10 , F0 , F8.
The re-order buffer and in-order commit allow us to flush the speculative instructions from the machine when a misprediction is discovered. ROB is another possible source of operands ▪ ROB can provide precise exception in an out-of-order machine ▪ ROB allows us to ignore exceptions on speculative code.
Tomasulo's algorithm is really not tied to any specific hardware and in fact, in real machines, register renaming generally happens before instructions are inserted into the instruction queue. Let's get a couple of fundamental points across. First, the registers as expressed in your program are logical registers. Second, the actual hardware registers are called physical registers. Tomasulo's algorithm is simply a mechanism for mapping logical registers to physical registers. In real machines, there are generally two tables mapping logical registers to physical registers. One in the rename stage and one in the commit stage. There also must be more physical registers than logical ones. It works essentially this way:
The part about reservation stations is really a specific implementation of an out-of-order pipeline and essentially comes along with a physical register. There are plenty of machines that don't really have a notion of a reservation station.
Hennessy and Patterson's books on Computer Architecture are the standard textbook on this sort of stuff. I tried to explain this as simply as possible, but there are literally thousands of proposals for optimizing this stuff.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With