This is the second time I'm asking this question; the first time someone did reply but I took too long to reply back to them and therefore didn't get the full understanding.
What I'm trying to do is learn more about the instruction fetch parts of modern architectures; to which I assume all instructions are predicted by the branch predictor for the instruction fetch unit to fetch as per prediction.
The other gentleman who attempted to helped mention something about a "branch instruction" also being sent along with the predicted instruction. This "branch instruction" tests the condition of the branch predictor's prediction as to whether it was correct or not. I also assume that these branch instructions go to the branch execution unit, and DONT require any loads from memory.
What I don't understand is:
Thanks!
On the SPEC'89 benchmarks, very large bimodal predictors saturate at 93.5% correct, once every branch maps to a unique counter. The predictor table is indexed with the instruction address bits, so that the processor can fetch a prediction for every instruction before the instruction is decoded.
Once the branch is decided, if the prediction was correct nothing happens but if the prediction was wrong, the pipeline simply switches to processing the correct instruction at the next clock.
When the prediction is right, that is, when the branch is not taken, there is no penalty to be paid. On the other hand, when the prediction is wrong, one bubble is created and the next instruction is fetched from the target address.
Branch prediction breaks instructions down into predicates, similar to predicate logic. A CPU using branch prediction only executes statements if a predicate is true. One example is using conditional logic. Since unnecessary code is not executed, the processor can work much more efficiently.
I think to answer your question you first need to understand how branch prediction works. To explain that it's necessary know why it makes those predictions in the first place. That requires an understanding of how the pipeline in modern processors works. The best way for me to explain that is to start with a how non-pipelined CPU process instructions.
(Bear with me if I go over things you already know. It's not entirely clear where your confusion regarding branch prediction stems from.)
Older CPUs, along with many simple modern ones, process instructions in a fairly straightforward and obvious way. They break up the actions necessary to execute an instruction into a series of steps. Each instruction is run through these steps and once it's done all of them it moves on to the next. For example, a hypothetical simple non-pipelined CPU might follow a series of steps like this:
It's relatively simple to implement a CPU this way, but it makes very inefficient use of the processor's resources. While the CPU is performing one step in process of executing and instruction, all the silicon used to implement the other steps is siting idle.
Modern pipelined processors try to be more efficient by turning the sequence of steps necessary to execute instructions into something like an assembly line. Instructions go through a series of steps, or stages are they're called, just like in a non-pipelined CPU. The difference is that once an instruction is has cleared the first stage of the pipeline, the CPU can send another instruction down. This allows several instructions to be in the pipeline at once, hopefully keeping all the parts of the chip well utilized. While an instruction still needs to go through a number of different stages, ideally instructions come out of the pipeline quickly one after each other. Pipelining doesn't improve the time to execute an instruction from start to finish, instead it shortens the time between instructions completing.
Now this a rather simplistic description of a modern pipeline, one that glosses over a number issues complicating the design of a modern pipelined CPU. However, as far branch prediction goes there's only one complication that needs to be addressed: what to do when executing a branch instruction.
First off, nothing special needs to be done with the branch instruction itself. It can be tossed down the pipeline like any other. Once it's cleared the first stage the CPU can then send down the next instruction - and therein lies the problem. What is the next instruction? Since it's a branch it could go two different ways, and the CPU won't know which until the branch instruction finishes its journey through the pipeline.
The simple thing for the processor to do is wait. Since no other instructions are being fed into the pipeline while it waits, the pipeline empties out. Only when the branch instruction exits the (now empty) pipeline can the CPU resume putting instructions in. That next instruction has to go through all of the stages of the now empty pipeline, resulting in a delay between when the branch instruction finishes and the next instruction does. In this case the instructions don't exit the pipeline in quick succession as ideally they should.
This delay can be quite big on modern processors. It's a function of the number stages in the pipeline, basically one cycle per stage. Most modern x86 CPUs have around 15 stages in their pipelines, so it would be very costly to implement branches that way. A simple CPU with a really short pipeline might be able to get away with always waiting, but modern processors have to do something else. They make a prediction.
The simplest form of prediction is a static branch prediction. The processor looks at only the branch instruction itself to guess whether it will be taken or not. The simplest form of static prediction is to assume that all branches are not taken, as this is the case more often than not. A more advanced static predictor assumes branches going backwards are taken and ones going forward are not. This makes the assumption that backwards branches are loops and loops are usually executed more than once.
Static prediction can work well, but it's still going to make a lot of bad predictions. You can improve on this by employing some sort of dynamic branch prediction. There's all sorts of different ways this can be done, too many to mention here, but they all make a guess based on the history of which way previous branches have gone.
However the CPU ends up making the prediction, it proceeds as if it had made the correct guess. After the branch instruction is put in the pipeline, it starts sending down the instructions it assumes will be executed. This is called speculative execution, and the instructions processed like this are tagged as being speculative. This lets the pipeline know it shouldn't do anything with these instructions that can't be undone.
When the branch instruction gets to the end of pipeline the CPU will now know whether its guess was correct. If it made the right prediction, it doesn't need to do anything, it can let the previous instruction finish the journey through the pipeline. Because it guessed correctly the branch has no additional cost.
If it guessed wrong it has to undo anything the speculatively executed instructions might have done, clear the pipeline and begin sending down the instructions it was supposed be executing. This results in the same delay as if it hadn't made a prediction at all.
So the answer to your question "How does the branch predictor know if it's not correct?" is that it doesn't know and doesn't care if it makes correct predictions or not. It just makes predictions. If it's a dynamic branch predictor then it will record whether the branch was taken or not taken in its history, but it won't record whether it made the correct decision.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With