Suppose I have an expression which only a part of is very unlikely, but the other is statistically neutral:
if (could_be || very_improbable) {
DoSomething();
}
Would it help the compiler in any way if I place the very improbable bit in an unlikely()
macro?
if (could_be || unlikely(very_improbable)) {
DoSomething();
}
Note: I'm not asking how the marcos work - I understand that. The question here is regarding GCC and will it be able to optimize the expression if I only hint about part of it. I also recognize that it might heavily depend on the expressions in question - I'm appealing to those who have experience with these macros.
C++ attribute: likely, unlikely (since C++20) 1) Applies to a statement to allow the compiler to optimize for the case where paths of execution including that statement are more likely than any alternative path of execution that does not include such a statement.
A look at the Linux kernel code will show many if conditions enclosed in likely and unlikely macros. These macros invoke compiler directives that give the compiler a hint on the code leg that should be optimized for performance.
Yes, it is reasonable, and compilers can and do take advantage of it in the right scenario.
In your actual example, if could_be
and very_improbable
are actually integral variables, there isn't going to be any point to inserting likely
or unlikely
macros on a sub-expression of the predicate, because what can the compiler really do about making this faster? The compiler can organize the if
block differently depending on the likely outcome of the branch, but just because very_improbably
is unlikely doesn't help: it still needs to generate the code to test it.
Let's take an example where the compiler can do more work:
extern int fn1();
extern int fn2();
extern int f(int x);
int test_likely(int a, int b) {
if (likely(f(a)) && unlikely(f(b)))
return fn1();
return fn2();
}
Here the predicate is composed of two calls to f()
with arguments, and icc
produces different code for 3 out of the 4 combinations of likely
and unlikely
:
Code produced for likely(f(a)) && likely(f(b))
:
test_likely(int, int):
push r15 #8.31
mov r15d, esi #8.31
call f(int) #9.7
test eax, eax #9.7
je ..B1.7 # Prob 5% #9.7
mov edi, r15d #9.23
call f(int) #9.23
test eax, eax #9.23
je ..B1.7 # Prob 5% #9.23
pop r15 #10.12
jmp fn1() #10.12
..B1.7: # Preds ..B1.4 ..B1.2
pop r15 #11.10
jmp fn2() #11.10
Here, both predicates are likely true, so icc
produces straight-line code for the case where both are true, jumping out-of-line if either turns out to be false.
Code produced for unlikely(f(a)) && likely(f(b))
:
test_likely(int, int):
push r15 #8.31
mov r15d, esi #8.31
call f(int) #9.7
test eax, eax #9.7
jne ..B1.5 # Prob 5% #9.7
..B1.3: # Preds ..B1.6 ..B1.2
pop r15 #11.10
jmp fn2() #11.10
..B1.5: # Preds ..B1.2
mov edi, r15d #9.25
call f(int) #9.25
test eax, eax #9.25
je ..B1.3 # Prob 5% #9.25
pop r15 #10.12
jmp fn1() #10.12
Now, the predicate is likely false, so icc
produces straight-line code leading directly to a return in that case, and jumps out of line to B1.5
to continue the predicate. In that case, it expects the second call (f(b)
) to be true, so it generates fall through code ending in a tail-call to fn1()
. If the second call turns out false, it jumps back up to the same sequence already assembled for the fall-though case in the first jump (label B1.3
).
This turns out also to be the code generated for unlikely(f(a)) && unlikely(f(b))
. In this case, you could imagine the compiler changing the end of the code to put a jmp fn2()
as the fall-through case, but it doesn't. It is key to notice that this would prevent re-using of the earlier sequence at B1.3
and it is also unlikely that we are even executing this code, so it seems reasonable to favor smaller code size over optimizing an already unlikely case.
Code produced for likely(f(a)) && unlikely(f(b))
:
test_likely(int, int):
push r15 #8.31
mov r15d, esi #8.31
call f(int) #9.7
test eax, eax #9.7
je ..B1.5 # Prob 5% #9.7
mov edi, r15d #9.23
call f(int) #9.23
test eax, eax #9.23
jne ..B1.7 # Prob 5% #9.23
..B1.5: # Preds ..B1.4 ..B1.2
pop r15 #11.10
jmp fn2() #11.10
..B1.7: # Preds ..B1.4
pop r15 #10.12
jmp fn1() #10.12
This is similar to the first case (likely && likely
), except that the expectation for the second predicate is now false, so it re-orders the blocks so that the return fn2()
case is the fall-through one.
So compilers definitely can use precise likely
and unlikely
information, and indeed it makes sense: if you broke the above test up into two chained if
statements, it's pretty obvious that separate branch hints would work, so it isn't surprising that the semantically equivalent use of &&
still benefits from hints.
Here are a few other notes that didn't get "full text" treatment, in case you got this far:
icc
to illustrate the examples, but for this test at least both clang
and gcc
make the same basic optimizations (compiling 3 out of 4 cases differently).likely(X) && unlikely(Y)
, you can check condition Y
first, since it's very likely to allow you shortcut checking Y1. Apparently, gcc can make this optimization for simple predicates, but I wasn't able to coax icc or clang into doing it. The gcc optimization is apparently quite fragile: it disappears if you change the predicate a bit, even though the optimization would be much better in that case.1Of course, this is only allowed when the compiler can see that X
and Y
have no side effects, and it may not be effective if Y
is much more expensive to check compared to X
(since any the benefit of avoiding the check to Y
is overwhelmed by the high cost of additional X
evaluations).
Yes, this may help. For example in the following example, when XXX
is off, GCC will test x
before unexpected y > 0
thus avoiding execution of unlikely code at runtime (Cygwin, gcc 5.4). Of course in this particular case y
check has been written before x
check but this isn't critical as during codegen your code may be re-shuffled by GCC in unpredictable ways anyway.
#ifdef XXX
#define __builtin_expect(x, y) (x)
#endif
void bar();
void foo(int x, int y, int z) {
if(__builtin_expect(y > 0, 0) || x == 0)
bar ();
}
When XXX
is off (i.e. __builtin_expect
is active):
foo:
testl %ecx, %ecx
je .L4
testl %edx, %edx
jg .L4
rep ret
When XXX
is defined (i.e. __builtin_expect
ignored):
foo:
testl %edx, %edx
jg .L4
testl %ecx, %ecx
je .L4
rep ret
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With