I know the kernel uses the likely
and unlikely
macros prodigiously. The docs for the macros are located at Built-in Function: long __builtin_expect (long exp, long c). But they don't really discuss the details.
How exactly does a compiler handle likely(x)
and __builtin_expect((x),1)
?
Is it handled by the code generator or the optimizer?
Does it depend upon optimization levels?
What's an example of the code generated?
You can use the __builtin_expect built-in function to indicate that an expression is likely to evaluate to a specified value. The compiler can use this knowledge to direct optimizations. This built-in function is portable with the GNU C/C++ __builtin_expect function.
C++ attribute: likely, unlikely (since C++20) 1) Applies to a statement to allow the compiler to optimize for the case where paths of execution including that statement are more likely than any alternative path of execution that does not include such a statement.
I just tested a simple example on gcc.
For x86 this seems to be handled by the optimizer and depend on optimization levels. Although I guess a correct answer here would be "it depends on the compiler".
The code generated is CPU dependent. Some cpus (sparc64 comes immediately to my mind, but I'm sure there are others) have flags on conditional branch instructions that tell the CPU how to predict it, so the compiler generates "predict true/predict false" instructions depending on the built in rules in the compiler and hints from the code (like __builtin_expect
).
Intel documents their behavior here: https://software.intel.com/en-us/articles/branch-and-loop-reorganization-to-prevent-mispredicts . In short the behavior on Intel CPUs is that if the CPU has no previous information about a branch it will predict forward branches as unlikely to be taken, while backwards branches are likely to be taken (think about loops vs. error handling).
This is some example code:
int bar(int);
int
foo(int x)
{
if (__builtin_expect(x>10, PREDICTION))
return bar(10);
return 42;
}
Compiled with (I'm using omit-frame-pointer to make the output more readable, but I still cleaned it up below):
$ cc -S -fomit-frame-pointer -O0 -DPREDICTION=0 -o 00.s foo.c
$ cc -S -fomit-frame-pointer -O0 -DPREDICTION=1 -o 01.s foo.c
$ cc -S -fomit-frame-pointer -O2 -DPREDICTION=0 -o 20.s foo.c
$ cc -S -fomit-frame-pointer -O2 -DPREDICTION=1 -o 21.s foo.c
There's no difference between 00.s and 01.s, so that shows that this is dependent on optimization (for gcc at least).
Here's the (cleaned up) generated code for 20.s:
foo:
cmpl $10, %edi
jg .L2
movl $42, %eax
ret
.L2:
movl $10, %edi
jmp bar
And here is 21.s:
foo:
cmpl $10, %edi
jle .L6
movl $10, %edi
jmp bar
.L6:
movl $42, %eax
ret
As expected the compiler rearranged the code so that the branch we don't expect to take is done in a forward branch.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With