Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is it about CMOV which improves CPU pipeline performance?

Tags:

I understand when a branch is easily predicted its better to use an IF statement because the branch is totally free. I have learnt that if the branch isn't easily predicted, then a CMOV is better. However, I do not quite understand how this could be achieved?

Surely the problem domain is still the same- we do not know the address of the next instruction to execute? So I don't understand how all the way down the pipeline, when the CMOV is executed, how that could have helped the instruction fetcher (10 CPU cycles back in the past) choose the correct path and prevent a pipeline stall?

Could somebody please help me understand how CMOV improves branching?

like image 769
user997112 Avatar asked Nov 25 '14 21:11

user997112


People also ask

What is CMOV?

The CMOV instruction is a predicated (or conditional) move instruction. It combines a branch and move instruction into one opcode. The CMOV instruction is useful in compiler optimization because it helps remove some of the conditional jump instructions from the code.

Is branching slow?

A branch instruction is not inherently slower than any other instruction. However, the reason you heard that branches should avoided is because modern CPUs follow a pipeline architecture. This means that there are multiple sequential instructions being executed simultaneously.

What is a conditional move instruction?

Conditional move instructions write the contents of one register over another only if the predicate's value is true, whereas conditional select instructions choose which of two registers has its contents written to a third based on the predicate's value. A more generalized and capable form is full predication.


1 Answers

CMOV instructions don't direct the path of control flow. They are instructions that are executed to compute the result based on condition codes, i.e. predicated instructions. Some architectures (like ARM) can predicate many forms of instructions based on condition codes, but x86 can only do "mov", that is, the conditional move (CMOV). These are decoded, and executed with latency in order to determine the result of the instruction.

Branches, on the other hand, are predicted and actually steer the execution of instructions. The branch predictor "looks ahead" of the instruction "fetcher", specifically looking for branch instructions, and predicts the path by steering the flow. Think of a railroad track where a person ahead shifts the tracks either left or right to tell the train where to go. Now if the guy chose the wrong direction, the train has to stop, backup, then move again in the right direction. Lots of time wasted.

CMOVs, on the other hand, don't steer the flow. They are simply instructions take extra time (and create additional dependencies) to figure out the proper result of the move based on condition codes. Think of the train, instead of decided to go left or right, takes a straight path that requires no turn, but is a bit slower (obviously way more complicated, but it's the best I can come up with right now).

CMOVs used to be really bad (very high latency) but have since improved to be pretty fast, making them a lot more usable and performance-worthy.

Hope this helps..

like image 57
drivingon9 Avatar answered Sep 27 '22 17:09

drivingon9