Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it reasonable to mark only part of an expression as likely()/unlikely()

Suppose I have an expression which only a part of is very unlikely, but the other is statistically neutral:

if (could_be || very_improbable) {
    DoSomething();
}

Would it help the compiler in any way if I place the very improbable bit in an unlikely() macro?

if (could_be || unlikely(very_improbable)) {
    DoSomething();
}

Note: I'm not asking how the marcos work - I understand that. The question here is regarding GCC and will it be able to optimize the expression if I only hint about part of it. I also recognize that it might heavily depend on the expressions in question - I'm appealing to those who have experience with these macros.

like image 616
immortal Avatar asked Jan 29 '17 16:01

immortal


People also ask

When to use unlikely in C?

C++ attribute: likely, unlikely (since C++20) 1) Applies to a statement to allow the compiler to optimize for the case where paths of execution including that statement are more likely than any alternative path of execution that does not include such a statement.

What is the use of likely and unlikely macros in Linux kernel?

A look at the Linux kernel code will show many if conditions enclosed in likely and unlikely macros. These macros invoke compiler directives that give the compiler a hint on the code leg that should be optimized for performance.


2 Answers

Yes, it is reasonable, and compilers can and do take advantage of it in the right scenario.

In your actual example, if could_be and very_improbable are actually integral variables, there isn't going to be any point to inserting likely or unlikely macros on a sub-expression of the predicate, because what can the compiler really do about making this faster? The compiler can organize the if block differently depending on the likely outcome of the branch, but just because very_improbably is unlikely doesn't help: it still needs to generate the code to test it.

Let's take an example where the compiler can do more work:

extern int fn1();
extern int fn2();
extern int f(int x);

int test_likely(int a, int b) {
  if (likely(f(a)) && unlikely(f(b)))
    return fn1();
  return fn2();
}

Here the predicate is composed of two calls to f() with arguments, and icc produces different code for 3 out of the 4 combinations of likely and unlikely:

Code produced for likely(f(a)) && likely(f(b)):

test_likely(int, int):
        push      r15                                           #8.31
        mov       r15d, esi                                     #8.31
        call      f(int)                                         #9.7
        test      eax, eax                                      #9.7
        je        ..B1.7        # Prob 5%                       #9.7
        mov       edi, r15d                                     #9.23
        call      f(int)                                         #9.23
        test      eax, eax                                      #9.23
        je        ..B1.7        # Prob 5%                       #9.23
        pop       r15                                           #10.12
        jmp       fn1()                                       #10.12
..B1.7:                         # Preds ..B1.4 ..B1.2
        pop       r15                                           #11.10
        jmp       fn2()                                       #11.10

Here, both predicates are likely true, so icc produces straight-line code for the case where both are true, jumping out-of-line if either turns out to be false.

Code produced for unlikely(f(a)) && likely(f(b)):

test_likely(int, int):
        push      r15                                           #8.31
        mov       r15d, esi                                     #8.31
        call      f(int)                                         #9.7
        test      eax, eax                                      #9.7
        jne       ..B1.5        # Prob 5%                       #9.7
..B1.3:                         # Preds ..B1.6 ..B1.2
        pop       r15                                           #11.10
        jmp       fn2()                                       #11.10
..B1.5:                         # Preds ..B1.2
        mov       edi, r15d                                     #9.25
        call      f(int)                                         #9.25
        test      eax, eax                                      #9.25
        je        ..B1.3        # Prob 5%                       #9.25
        pop       r15                                           #10.12
        jmp       fn1()                                       #10.12

Now, the predicate is likely false, so icc produces straight-line code leading directly to a return in that case, and jumps out of line to B1.5 to continue the predicate. In that case, it expects the second call (f(b)) to be true, so it generates fall through code ending in a tail-call to fn1(). If the second call turns out false, it jumps back up to the same sequence already assembled for the fall-though case in the first jump (label B1.3).

This turns out also to be the code generated for unlikely(f(a)) && unlikely(f(b)). In this case, you could imagine the compiler changing the end of the code to put a jmp fn2() as the fall-through case, but it doesn't. It is key to notice that this would prevent re-using of the earlier sequence at B1.3 and it is also unlikely that we are even executing this code, so it seems reasonable to favor smaller code size over optimizing an already unlikely case.

Code produced for likely(f(a)) && unlikely(f(b)):

test_likely(int, int):
        push      r15                                           #8.31
        mov       r15d, esi                                     #8.31
        call      f(int)                                         #9.7
        test      eax, eax                                      #9.7
        je        ..B1.5        # Prob 5%                       #9.7
        mov       edi, r15d                                     #9.23
        call      f(int)                                         #9.23
        test      eax, eax                                      #9.23
        jne       ..B1.7        # Prob 5%                       #9.23
..B1.5:                         # Preds ..B1.4 ..B1.2
        pop       r15                                           #11.10
        jmp       fn2()                                       #11.10
..B1.7:                         # Preds ..B1.4
        pop       r15                                           #10.12
        jmp       fn1()                                       #10.12

This is similar to the first case (likely && likely), except that the expectation for the second predicate is now false, so it re-orders the blocks so that the return fn2() case is the fall-through one.

So compilers definitely can use precise likely and unlikely information, and indeed it makes sense: if you broke the above test up into two chained if statements, it's pretty obvious that separate branch hints would work, so it isn't surprising that the semantically equivalent use of && still benefits from hints.

Here are a few other notes that didn't get "full text" treatment, in case you got this far:

  • I used icc to illustrate the examples, but for this test at least both clang and gcc make the same basic optimizations (compiling 3 out of 4 cases differently).
  • One "obvious" optimization a compiler could make by knowing the probabilities of the sub-predicates is to reverse the order of the predicates. For example, if you have likely(X) && unlikely(Y), you can check condition Y first, since it's very likely to allow you shortcut checking Y1. Apparently, gcc can make this optimization for simple predicates, but I wasn't able to coax icc or clang into doing it. The gcc optimization is apparently quite fragile: it disappears if you change the predicate a bit, even though the optimization would be much better in that case.
  • Compilers are restricted from making optimizations when they can't guarantee that the transformed code will behave "as if" it were compiled directly according to the language semantics. In particular, they have limited ability to reorder operations unless they can prove the operations have no side-effects. Keep that in mind when structuring your predicates.

1Of course, this is only allowed when the compiler can see that X and Y have no side effects, and it may not be effective if Y is much more expensive to check compared to X (since any the benefit of avoiding the check to Y is overwhelmed by the high cost of additional X evaluations).

like image 53
BeeOnRope Avatar answered Oct 13 '22 00:10

BeeOnRope


Yes, this may help. For example in the following example, when XXX is off, GCC will test x before unexpected y > 0 thus avoiding execution of unlikely code at runtime (Cygwin, gcc 5.4). Of course in this particular case y check has been written before x check but this isn't critical as during codegen your code may be re-shuffled by GCC in unpredictable ways anyway.

#ifdef XXX
#define __builtin_expect(x, y) (x)
#endif

void bar();

void foo(int x, int y, int z) {
  if(__builtin_expect(y > 0, 0) || x == 0)
    bar ();
}

When XXX is off (i.e. __builtin_expect is active):

foo:
  testl   %ecx, %ecx
  je      .L4
  testl   %edx, %edx
  jg      .L4
  rep ret

When XXX is defined (i.e. __builtin_expect ignored):

foo:
  testl   %edx, %edx
  jg      .L4
  testl   %ecx, %ecx
  je      .L4
  rep ret
like image 41
yugr Avatar answered Oct 13 '22 01:10

yugr