Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How far does GCC's __builtin_expect go?

Tags:

While answering another question I got curious about this. I'm well aware that

if( __builtin_expect( !!a, 0 ) ) {     // not likely } else {     // quite likely } 

will make the "quite likely" branch faster (in general) by doing something along the lines of hinting to the processor / changing the assembly code order / some kind of magic. (if anyone can clarify that magic that would also be great).

But does this work for a) inline ifs, b) variables and c) values other than 0 and 1? i.e. will

__builtin_expect( !!a, 0 ) ? /* unlikely */ : /* likely */; 

or

int x = __builtin_expect( t / 10, 7 ); if( x == 7 ) {     // likely } else {     // unlikely } 

or

if( __builtin_expect( a, 3 ) ) {     // likely     // uh-oh, what happens if a is 2? } else {     // unlikely } 

have any effect? And does all of this depend on the architecture being targeted?

like image 959
Dave Avatar asked Mar 18 '13 00:03

Dave


Video Answer


2 Answers

Did you read the GCC documentation?

Built-in Function: long __builtin_expect (long exp, long c)

You may use __builtin_expect to provide the compiler with branch prediction information. In general, you should prefer to use actual profile feedback for this (-fprofile-arcs), as programmers are notoriously bad at predicting how their programs actually perform. However, there are applications in which this data is hard to collect.

The return value is the value of exp, which should be an integral expression. The semantics of the built-in are that it is expected that exp == c. For example:

if (__builtin_expect (x, 0))     foo (); 

indicates that we do not expect to call foo, since we expect x to be zero. Since you are limited to integral expressions for exp, you should use constructions such as

if (__builtin_expect (ptr != NULL, 1))     foo (*ptr); 

when testing pointer or floating-point values.

To explain this a bit... __builtin_expect is specifically useful for communicating which branch you think the program's likely to take. You ask how the compiler can use that insight - well, consider this code:

if (x == 0)     return 10 * y; else     return 39; 

In machine code, the CPU can typically be asked to "goto" another line (which takes time, and depending on the CPU may prevent other execution optimisations - i.e. beneath the level of machine code - for example, see the Branches heading under http://en.wikipedia.org/wiki/Instruction_pipeline), or to call some other code, but there's not really an if/else concept where both true and false code are equal... you have to branch away to find the code for one or the other. The way that's done is basically, in pseudo-code:

test whether x is 0 if it was goto else_return_39 return 10 * y else_return_39: return 39 

Given most CPUs are slower following the goto down to the else_return_39: label than to just fall through to return 10 * y, code for the "true" branch will be reached faster than for the false branch. Of course, the machine code could test whether x is not 0, put the "false" code (return 39) first and thereby reverse the performance characteristics.

This is what __builtin_expect controls - you can tell the compiler to put the true or the false branch where less branching is needed to reach it, thereby getting a tiny performance boost.

But does this work for a) inline ifs, b) variables and c) values other than 0 and 1?

a) Whether the surrounding function is inlined or not doesn't change the need for branching where the if statement appears (unless the optimiser sees the condition the if statement tests is always true or false and only one branch could never run). So, it's equally applicable to inlined code.

[ Your comment shows you were interested in conditional expressions - a ? b : c - I'm not sure - there's a disputed answer to that question at Can I use GCC's __builtin_expect() with ternary operator in C that might prove insightful one way or the other, or the basis for further exploration ]

b) variables - you postulated:

int x = __builtin_expect( t / 10, 7 ); if( x == 7 ) { 

That won't work - the compiler's not obliged to associate such expectations with variables and remember them the next time an if is seen. You can verify this (as I did for gcc 3.4.4) using gcc -S to produce assembly language output: the assembly doesn't change regardless of the expected value.

c) values other than 0 and 1

It works for integral (long) values, so yes. The last paragraph of the documentation quoted above address this, specifically:

you should use constructions such as

if (__builtin_expect (ptr != NULL, 1))     foo (*ptr); 

when testing pointer or floating-point values.

Why? Well, if the pointer type is larger than long, then calling __builtin_conversion(long, long) would effectively chop off some of the less-significant bits and fail to incorporate the rest in the test. Similarly, floating point values might be larger than a long, and the conversion not produce the result you expect. By using a boolean expression such as ptr != NULL (given true converts to 1L and false to 0) you're sure to get intended results.

like image 176
Tony Delroy Avatar answered Sep 20 '22 03:09

Tony Delroy


But does this work for a) inline ifs, b) variables and c) values other than 0 and 1?

It works for an expression context that is used to determine branching.

So, a) Yes. b) No. c) Yes.

And does all of this depend on the architecture being targeted?

Yes!

It leverages architectures that use instruction pipelining, which allow a CPU to begin working on upcoming instructions before the current instruction has been completed.

(if anyone can clarify that magic that would also be great).

("Branch prediction" complicates this description, so I'm intentionally omitting it)

Any code resembling an if statement implies that an expression may result in the CPU jumping to a different location in the program. These jumps invalidate what's in the CPU's instruction pipeline.

__builtin_expect allows (without guarantee) gcc to try to assemble the code so the likely scenario involves fewer jumps than the alternate.

like image 39
Drew Dormann Avatar answered Sep 19 '22 03:09

Drew Dormann