Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are compilers able to avoid branching instructions?

I've been reading about bit twiddling hacks and thought, are compilers able to avoid branching in the following code:

constexpr int min(const int lhs, const int rhs) noexcept {
    if (lhs < rhs) {
        return lhs;
    }
    return rhs
}

by replacing it with (explanation):

constexpr int min(const int lhs, const int rhs) noexcept {
    return rhs ^ ((lhs ^ rhs) & -(lhs < rhs));
}
like image 311
Kostya Avatar asked Dec 20 '22 00:12

Kostya


2 Answers

  • Are compilers able to...: Yes definitively!
  • Can I rely on those optimizations?: No, you can't rely on any optimization. There may always be some strange conditions under which a compiler chooses not to implement a certain optimization for some non-obvious reason or just fails to see the possibility. Also in general, I've made the observation that compilers sometimes are a lot dumber than people think (Or the people (including me) are dumber than they think).(1)
  • Not asked, but a very important aspect: Can I rely on this actually being an optimization? NO! For one, (especially on x86) performance always depends on the surrounding code and there are a lot of different optimizations that interact. Also, some architectures might even offer commands that implement the operation even more efficient.
  • Should I use bit twiddeling optimizations?: In general: No - especially not without verifying that they actually give you any benefit! Even when they do improve performance, it makes your code harder to read and review and it makes some architecture and compiler specific assumptions (representation of integers, execution time of instructions, penalty for branch miss-prediction ...), that might lead to worse performance when you port your code to another architecture or - in the worst case even lead to false results.

My advice:
If you need to get the last bit of performance for a specific system, then just try both variants and measure (and verify the result every time, you update your CPU and/or compiler). For any other case, assume that the compiler is at least as good in making low level optimizations as you are. I'd also suggest that you first learn about all optimization related compiler flags and set up a proper benchmark BEFORE starting to use low level optimizations in any case.

I think the only area, where hand optimizations are still sometimes beneficial is if you want to to optimally use vector units. Modern compiler can auto-vectorize many things, but this is still relatively new field and there are certain things which the compiler is just not allowed to do because it violates some guarantees from the standard (especially where floating point operations are concerned).

(1) Some people seem to think, that independent of what their code looks like, the compiler will always produce the optimal code that provides the same semantics. For one, there are limits to what a compiler can do in a limited amount of time (there are a lot of heuristics that work MOST of the time but not always). Second, in many cases, the c++ standard requires the compiler to give certain guarantees, that you are not actually interested in at the moment, but still prevent optimizations.

like image 159
MikeMB Avatar answered Dec 21 '22 14:12

MikeMB


clang++ (3.5.2-1) seems to be smart enough -O3 (I'm not using c++11 or c++14, constexpr and noexcept removed from source):

08048760 <_Z3minii>:
 8048760:   8b 44 24 08             mov    0x8(%esp),%eax
 8048764:   8b 4c 24 04             mov    0x4(%esp),%ecx
 8048768:   39 c1                   cmp    %eax,%ecx
 804876a:   0f 4e c1                cmovle %ecx,%eax
 804876d:   c3                      ret    

gcc (4.9.3) (-O3) instead do branch with jle:

08048740 <_Z3minii>:
 8048740:       8b 54 24 08             mov    0x8(%esp),%edx
 8048744:       8b 44 24 04             mov    0x4(%esp),%eax
 8048748:       39 d0                   cmp    %edx,%eax
 804874a:       7e 02                   jle    804874e <_Z3minii+0xe>
 804874c:       89 d0                   mov    %edx,%eax
 804874e:       f3 c3                   repz ret 

(x86 32bit)

This min2 (mangled) is the bit alternative (from gcc):

08048750 <_Z4min2ii>:
 8048750:       8b 44 24 08             mov    0x8(%esp),%eax
 8048754:       8b 54 24 04             mov    0x4(%esp),%edx
 8048758:       31 c9                   xor    %ecx,%ecx
 804875a:       39 c2                   cmp    %eax,%edx
 804875c:       0f 9c c1                setl   %cl
 804875f:       31 c2                   xor    %eax,%edx
 8048761:       f7 d9                   neg    %ecx
 8048763:       21 ca                   and    %ecx,%edx
 8048765:       31 d0                   xor    %edx,%eax
 8048767:       c3                      ret    
like image 20
Alex Avatar answered Dec 21 '22 13:12

Alex