While answering another question I got curious about this. I'm well aware that
if( __builtin_expect( !!a, 0 ) ) { // not likely } else { // quite likely }
will make the "quite likely" branch faster (in general) by doing something along the lines of hinting to the processor / changing the assembly code order / some kind of magic. (if anyone can clarify that magic that would also be great).
But does this work for a) inline ifs, b) variables and c) values other than 0 and 1? i.e. will
__builtin_expect( !!a, 0 ) ? /* unlikely */ : /* likely */;
or
int x = __builtin_expect( t / 10, 7 ); if( x == 7 ) { // likely } else { // unlikely }
or
if( __builtin_expect( a, 3 ) ) { // likely // uh-oh, what happens if a is 2? } else { // unlikely }
have any effect? And does all of this depend on the architecture being targeted?
Did you read the GCC documentation?
Built-in Function: long __builtin_expect (long exp, long c)
You may use __builtin_expect to provide the compiler with branch prediction information. In general, you should prefer to use actual profile feedback for this (-fprofile-arcs), as programmers are notoriously bad at predicting how their programs actually perform. However, there are applications in which this data is hard to collect.
The return value is the value of exp, which should be an integral expression. The semantics of the built-in are that it is expected that exp == c. For example:
if (__builtin_expect (x, 0)) foo ();
indicates that we do not expect to call foo, since we expect x to be zero. Since you are limited to integral expressions for exp, you should use constructions such as
if (__builtin_expect (ptr != NULL, 1)) foo (*ptr);
when testing pointer or floating-point values.
To explain this a bit... __builtin_expect is specifically useful for communicating which branch you think the program's likely to take. You ask how the compiler can use that insight - well, consider this code:
if (x == 0) return 10 * y; else return 39;
In machine code, the CPU can typically be asked to "goto" another line (which takes time, and depending on the CPU may prevent other execution optimisations - i.e. beneath the level of machine code - for example, see the Branches heading under http://en.wikipedia.org/wiki/Instruction_pipeline), or to call some other code, but there's not really an if/else concept where both true and false code are equal... you have to branch away to find the code for one or the other. The way that's done is basically, in pseudo-code:
test whether x is 0 if it was goto else_return_39 return 10 * y else_return_39: return 39
Given most CPUs are slower following the goto
down to the else_return_39:
label than to just fall through to return 10 * y
, code for the "true" branch will be reached faster than for the false branch. Of course, the machine code could test whether x is not 0, put the "false" code (return 39
) first and thereby reverse the performance characteristics.
This is what __builtin_expect controls - you can tell the compiler to put the true or the false branch where less branching is needed to reach it, thereby getting a tiny performance boost.
But does this work for a) inline ifs, b) variables and c) values other than 0 and 1?
a) Whether the surrounding function is inlined or not doesn't change the need for branching where the if
statement appears (unless the optimiser sees the condition the if
statement tests is always true
or false
and only one branch could never run). So, it's equally applicable to inlined code.
[ Your comment shows you were interested in conditional expressions - a ? b : c
- I'm not sure - there's a disputed answer to that question at Can I use GCC's __builtin_expect() with ternary operator in C that might prove insightful one way or the other, or the basis for further exploration ]
b) variables - you postulated:
int x = __builtin_expect( t / 10, 7 ); if( x == 7 ) {
That won't work - the compiler's not obliged to associate such expectations with variables and remember them the next time an if
is seen. You can verify this (as I did for gcc 3.4.4) using gcc -S
to produce assembly language output: the assembly doesn't change regardless of the expected value.
c) values other than 0 and 1
It works for integral (long
) values, so yes. The last paragraph of the documentation quoted above address this, specifically:
you should use constructions such as
if (__builtin_expect (ptr != NULL, 1)) foo (*ptr);
when testing pointer or floating-point values.
Why? Well, if the pointer type is larger than long
, then calling __builtin_conversion(long, long)
would effectively chop off some of the less-significant bits and fail to incorporate the rest in the test. Similarly, floating point values might be larger than a long, and the conversion not produce the result you expect. By using a boolean expression such as ptr != NULL
(given true
converts to 1L and false
to 0) you're sure to get intended results.
But does this work for a) inline ifs, b) variables and c) values other than 0 and 1?
It works for an expression context that is used to determine branching.
So, a) Yes. b) No. c) Yes.
And does all of this depend on the architecture being targeted?
Yes!
It leverages architectures that use instruction pipelining, which allow a CPU to begin working on upcoming instructions before the current instruction has been completed.
(if anyone can clarify that magic that would also be great).
("Branch prediction" complicates this description, so I'm intentionally omitting it)
Any code resembling an if statement implies that an expression may result in the CPU jumping to a different location in the program. These jumps invalidate what's in the CPU's instruction pipeline.
__builtin_expect
allows (without guarantee) gcc to try to assemble the code so the likely scenario involves fewer jumps than the alternate.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With