How does the branch predictor know if it is not correct?

Tags:

This is the second time I'm asking this question; the first time someone did reply but I took too long to reply back to them and therefore didn't get the full understanding.

What I'm trying to do is learn more about the instruction fetch parts of modern architectures; to which I assume all instructions are predicted by the branch predictor for the instruction fetch unit to fetch as per prediction.

The other gentleman who attempted to helped mention something about a "branch instruction" also being sent along with the predicted instruction. This "branch instruction" tests the condition of the branch predictor's prediction as to whether it was correct or not. I also assume that these branch instructions go to the branch execution unit, and DONT require any loads from memory.

What I don't understand is:

How does the branch execution unit know whether the guess was correct or not with this instruction?
What happens once it knows it is correct?
Does a branch instruction get issued EVERY prediction (basically meaning... EVERY time any prediction is made?)
Must a branch prediction go before or after the predicted instruction?
Does a branch instruction require any data loaded from memory? If so, what is it?

Thanks!

323

asked Jul 20 '14 16:07

elemein

1 Answers

I think to answer your question you first need to understand how branch prediction works. To explain that it's necessary know why it makes those predictions in the first place. That requires an understanding of how the pipeline in modern processors works. The best way for me to explain that is to start with a how non-pipelined CPU process instructions.

(Bear with me if I go over things you already know. It's not entirely clear where your confusion regarding branch prediction stems from.)

Older CPUs, along with many simple modern ones, process instructions in a fairly straightforward and obvious way. They break up the actions necessary to execute an instruction into a series of steps. Each instruction is run through these steps and once it's done all of them it moves on to the next. For example, a hypothetical simple non-pipelined CPU might follow a series of steps like this:

Fetch the instruction into memory.
Read in the operands of the instruction.
Execute the operation for that instruction.
Write the result

It's relatively simple to implement a CPU this way, but it makes very inefficient use of the processor's resources. While the CPU is performing one step in process of executing and instruction, all the silicon used to implement the other steps is siting idle.

Modern pipelined processors try to be more efficient by turning the sequence of steps necessary to execute instructions into something like an assembly line. Instructions go through a series of steps, or stages are they're called, just like in a non-pipelined CPU. The difference is that once an instruction is has cleared the first stage of the pipeline, the CPU can send another instruction down. This allows several instructions to be in the pipeline at once, hopefully keeping all the parts of the chip well utilized. While an instruction still needs to go through a number of different stages, ideally instructions come out of the pipeline quickly one after each other. Pipelining doesn't improve the time to execute an instruction from start to finish, instead it shortens the time between instructions completing.

Now this a rather simplistic description of a modern pipeline, one that glosses over a number issues complicating the design of a modern pipelined CPU. However, as far branch prediction goes there's only one complication that needs to be addressed: what to do when executing a branch instruction.

First off, nothing special needs to be done with the branch instruction itself. It can be tossed down the pipeline like any other. Once it's cleared the first stage the CPU can then send down the next instruction - and therein lies the problem. What is the next instruction? Since it's a branch it could go two different ways, and the CPU won't know which until the branch instruction finishes its journey through the pipeline.

The simple thing for the processor to do is wait. Since no other instructions are being fed into the pipeline while it waits, the pipeline empties out. Only when the branch instruction exits the (now empty) pipeline can the CPU resume putting instructions in. That next instruction has to go through all of the stages of the now empty pipeline, resulting in a delay between when the branch instruction finishes and the next instruction does. In this case the instructions don't exit the pipeline in quick succession as ideally they should.

This delay can be quite big on modern processors. It's a function of the number stages in the pipeline, basically one cycle per stage. Most modern x86 CPUs have around 15 stages in their pipelines, so it would be very costly to implement branches that way. A simple CPU with a really short pipeline might be able to get away with always waiting, but modern processors have to do something else. They make a prediction.

The simplest form of prediction is a static branch prediction. The processor looks at only the branch instruction itself to guess whether it will be taken or not. The simplest form of static prediction is to assume that all branches are not taken, as this is the case more often than not. A more advanced static predictor assumes branches going backwards are taken and ones going forward are not. This makes the assumption that backwards branches are loops and loops are usually executed more than once.

Static prediction can work well, but it's still going to make a lot of bad predictions. You can improve on this by employing some sort of dynamic branch prediction. There's all sorts of different ways this can be done, too many to mention here, but they all make a guess based on the history of which way previous branches have gone.

However the CPU ends up making the prediction, it proceeds as if it had made the correct guess. After the branch instruction is put in the pipeline, it starts sending down the instructions it assumes will be executed. This is called speculative execution, and the instructions processed like this are tagged as being speculative. This lets the pipeline know it shouldn't do anything with these instructions that can't be undone.

When the branch instruction gets to the end of pipeline the CPU will now know whether its guess was correct. If it made the right prediction, it doesn't need to do anything, it can let the previous instruction finish the journey through the pipeline. Because it guessed correctly the branch has no additional cost.

If it guessed wrong it has to undo anything the speculatively executed instructions might have done, clear the pipeline and begin sending down the instructions it was supposed be executing. This results in the same delay as if it hadn't made a prediction at all.

So the answer to your question "How does the branch predictor know if it's not correct?" is that it doesn't know and doesn't care if it makes correct predictions or not. It just makes predictions. If it's a dynamic branch predictor then it will record whether the branch was taken or not taken in its history, but it won't record whether it made the correct decision.

167

answered Sep 28 '22 06:09

Ross Ridge

Related questions
                            
                                How to optimize a cycle?
                            
                                How are global variables stored?
                            
                                Intel 64 and IA-32 | Atomic operations including acquire / release semantic
                            
                                gcc assembly output
                            
                                Burst Mode Definition
                            
                                GNU C inline asm "m" constraint with a pointer: address vs. pointed-to value?
                            
                                Accessing class members from inline assembly in Visual C++
                            
                                Performance of modern processor
                            
                                How does x86 real-mode segments overlap help memory saving?
                            
                                Real Mode Assembly: Print Char to Screen without INT Instruction on Boot
                            
                                On x86 if [mem] is not 32-bit aligned, can "lock inc [mem]" still work fine?
                            
                                compiled c program memory addresses
                            
                                Why is the code adding 7 if the number is not >= 0
                            
                                jmp FWORD PTR [eax-0x67]?
                            
                                How to make x=2a+3b in 4 instructions limit using ONLY mov,add,sub,neg? [closed]
                            
                                manipulating c variable via inline assembly [duplicate]
                            
                                ARM Assembly: Absolute Value Function: Are two or three lines faster?
                            
                                Why some part of an os has to be written in assembly? [duplicate]
                            
                                x86 Assembly Force Cache Store
                            
                                Assembly language using shl to multiply by an odd number?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does the branch predictor know if it is not correct?

Tags:

cpu-architecture

branch-prediction

assembly

cpu

elemein

People also ask

1 Answers

Ross Ridge

Recent Activity

Donate For Us