So I'm trying to understand the Tomasulo algorithm for out-of-order instruction execution. Here's what I'm getting so far: <ul> <li>Instructions are fetched in-order and stored in an instruction queue. </li> <li>Register renaming happens somewhere next...? From what I understand this is to avoid WAR/WAW hazards by giving labels to the registers. Say you have add r1,r2,r3 (1) add r3,r5,r6 (2) You have a WAR hazard and need to ensure that instruction (1) reads the old value of r3 before adding it to r1. So I guess within the instruction queue (?) the hardware renames the registers, i.e. add r1,r2,r3#1 add r3#2,r5,r6 Or something like that. </li> <li>Instructions are issued to reservation stations. From what I understand, each functional unit has its own set of reservation stations. But is it like a queue (FIFO) of instructions for that functional unit to execute when the appropriately tagged operands are available on the common data bus? </li> <li>Since instructions can finish in arbitrary order (out of order), and more instructions can keep coming... Is there like a stage where the common data bus updates the register file before more instructions come in? I've heard a re-order buffer is used, which basically sorts the instructions back in order (this must imply instructions have a tag of some sort) and then the register results are committed back into the register file. </li> </ul> What I'm confused about is the implementation of register renaming, and the structure of the reservation stations. Thank you for any help.

Tomasulo's algorithm is really not tied to any specific hardware and in fact, in real machines, register renaming generally happens before instructions are inserted into the instruction queue. Let's get a couple of fundamental points across. First, the registers as expressed in your program are logical registers. Second, the actual hardware registers are called physical registers. Tomasulo's algorithm is simply a mechanism for mapping logical registers to physical registers. In real machines, there are generally two tables mapping logical registers to physical registers. One in the rename stage and one in the commit stage. There also must be more physical registers than logical ones. It works essentially this way: <ol> <li>For each logical input, look up in the rename stage mapping table to find out which physical register currently holds that logical register value.</li> <li>For each logical output, find an unused physical register and map the output to it. Update the rename stage mapping table. If no physical register is available, wait until one becomes available.</li> <li>In the commit stage, when an instruction is committed, overwrite the old logical to physical mappings of the instruction's logical outputs with the new mappings. Free physical registers that were part of the old mappings.</li> <li>If there is a misprediction (or any kind of squash in the pipeline), the table in the commit stage overwrites the table in the rename stage and everything in the pipeline in-between is blown away.</li> </ol> The part about reservation stations is really a specific implementation of an out-of-order pipeline and essentially comes along with a physical register. There are plenty of machines that don't really have a notion of a reservation station. Hennessy and Patterson's books on Computer Architecture are the standard textbook on this sort of stuff. I tried to explain this as simply as possible, but there are literally thousands of proposals for optimizing this stuff.

Understanding the Tomasulo algorithm

Tags:

computer-science

computer-architecture

hardware

So I'm trying to understand the Tomasulo algorithm for out-of-order instruction execution. Here's what I'm getting so far:

Instructions are fetched in-order and stored in an instruction queue.
Register renaming happens somewhere next...? From what I understand this is to avoid WAR/WAW hazards by giving labels to the registers. Say you have add r1,r2,r3 (1) add r3,r5,r6 (2) You have a WAR hazard and need to ensure that instruction (1) reads the old value of r3 before adding it to r1. So I guess within the instruction queue (?) the hardware renames the registers, i.e. add r1,r2,r3#1 add r3#2,r5,r6 Or something like that.
Instructions are issued to reservation stations. From what I understand, each functional unit has its own set of reservation stations. But is it like a queue (FIFO) of instructions for that functional unit to execute when the appropriately tagged operands are available on the common data bus?
Since instructions can finish in arbitrary order (out of order), and more instructions can keep coming... Is there like a stage where the common data bus updates the register file before more instructions come in? I've heard a re-order buffer is used, which basically sorts the instructions back in order (this must imply instructions have a tag of some sort) and then the register results are committed back into the register file.

What I'm confused about is the implementation of register renaming, and the structure of the reservation stations.

Thank you for any help.

448

asked Jun 27 '12 04:06

JDS

1 Answers

Tomasulo's algorithm is really not tied to any specific hardware and in fact, in real machines, register renaming generally happens before instructions are inserted into the instruction queue. Let's get a couple of fundamental points across. First, the registers as expressed in your program are logical registers. Second, the actual hardware registers are called physical registers. Tomasulo's algorithm is simply a mechanism for mapping logical registers to physical registers. In real machines, there are generally two tables mapping logical registers to physical registers. One in the rename stage and one in the commit stage. There also must be more physical registers than logical ones. It works essentially this way:

For each logical input, look up in the rename stage mapping table to find out which physical register currently holds that logical register value.
For each logical output, find an unused physical register and map the output to it. Update the rename stage mapping table. If no physical register is available, wait until one becomes available.
In the commit stage, when an instruction is committed, overwrite the old logical to physical mappings of the instruction's logical outputs with the new mappings. Free physical registers that were part of the old mappings.
If there is a misprediction (or any kind of squash in the pipeline), the table in the commit stage overwrites the table in the rename stage and everything in the pipeline in-between is blown away.

The part about reservation stations is really a specific implementation of an out-of-order pipeline and essentially comes along with a physical register. There are plenty of machines that don't really have a notion of a reservation station.

Hennessy and Patterson's books on Computer Architecture are the standard textbook on this sort of stuff. I tried to explain this as simply as possible, but there are literally thousands of proposals for optimizing this stuff.

139

answered Nov 04 '22 22:11

Nathan Binkert

Related questions
                            
                                Listing all Smart Card Readers on a system (Alcor Micro reader issues)
                            
                                What hardware improvements should we make to speed up our build machine?
                            
                                Is it possible to control LCD components in software?
                            
                                Difference between MMU and memory controller
                            
                                hardware requirement for hadoop installation on Laptop
                            
                                How to get MAC-address in Rust?
                            
                                Powershell - check if a CD is in CD-ROM drive
                            
                                Leveraging high precision gpu's in tensor flow
                            
                                Flush & Reload cache side channel attack
                            
                                'if' vs 'when' for making multiplexer
                            
                                Automatic DPC latency testing - how?
                            
                                Could a Cray XK6 run a real-time raytracer?
                            
                                Can you piece together the following clues to help me read temperatures out of the Winbond W83793 chip?
                            
                                What is high-resolution performance counter?
                            
                                Beep function in VB6. What "beeps"?
                            
                                Does the hardware-prefetcher benefit in this memory access pattern?
                            
                                What is the status of the TSX-related Skylake errata SKL-105?
                            
                                How do I change my Active Sound Card on the Fly?
                            
                                Why is float division slow?
                            
                                Get manufacturer name in android device

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With