I've been reading about bit twiddling hacks and thought, are compilers able to avoid branching in the following code:
constexpr int min(const int lhs, const int rhs) noexcept {
if (lhs < rhs) {
return lhs;
}
return rhs
}
by replacing it with (explanation):
constexpr int min(const int lhs, const int rhs) noexcept {
return rhs ^ ((lhs ^ rhs) & -(lhs < rhs));
}
My advice:
If you need to get the last bit of performance for a specific system, then just try both variants and measure (and verify the result every time, you update your CPU and/or compiler). For any other case, assume that the compiler is at least as good in making low level optimizations as you are. I'd also suggest that you first learn about all optimization related compiler flags and set up a proper benchmark BEFORE starting to use low level optimizations in any case.
I think the only area, where hand optimizations are still sometimes beneficial is if you want to to optimally use vector units. Modern compiler can auto-vectorize many things, but this is still relatively new field and there are certain things which the compiler is just not allowed to do because it violates some guarantees from the standard (especially where floating point operations are concerned).
(1) Some people seem to think, that independent of what their code looks like, the compiler will always produce the optimal code that provides the same semantics. For one, there are limits to what a compiler can do in a limited amount of time (there are a lot of heuristics that work MOST of the time but not always). Second, in many cases, the c++ standard requires the compiler to give certain guarantees, that you are not actually interested in at the moment, but still prevent optimizations.
clang++
(3.5.2-1) seems to be smart enough -O3
(I'm not using c++11 or c++14, constexpr
and noexcept
removed from source):
08048760 <_Z3minii>:
8048760: 8b 44 24 08 mov 0x8(%esp),%eax
8048764: 8b 4c 24 04 mov 0x4(%esp),%ecx
8048768: 39 c1 cmp %eax,%ecx
804876a: 0f 4e c1 cmovle %ecx,%eax
804876d: c3 ret
gcc
(4.9.3) (-O3
) instead do branch with jle
:
08048740 <_Z3minii>:
8048740: 8b 54 24 08 mov 0x8(%esp),%edx
8048744: 8b 44 24 04 mov 0x4(%esp),%eax
8048748: 39 d0 cmp %edx,%eax
804874a: 7e 02 jle 804874e <_Z3minii+0xe>
804874c: 89 d0 mov %edx,%eax
804874e: f3 c3 repz ret
(x86 32bit)
This min2
(mangled) is the bit alternative (from gcc
):
08048750 <_Z4min2ii>:
8048750: 8b 44 24 08 mov 0x8(%esp),%eax
8048754: 8b 54 24 04 mov 0x4(%esp),%edx
8048758: 31 c9 xor %ecx,%ecx
804875a: 39 c2 cmp %eax,%edx
804875c: 0f 9c c1 setl %cl
804875f: 31 c2 xor %eax,%edx
8048761: f7 d9 neg %ecx
8048763: 21 ca and %ecx,%edx
8048765: 31 d0 xor %edx,%eax
8048767: c3 ret
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With