I have to compute the difference between two uint8_t variables in an extremely efficiency-sensitive area of code. It is imperative that I find the fastest way possible to execute this computation. This is for a program written in C++ running on Ubuntu.
I currently use the following macro:
#define UINT8_T_DIFF(a, b) (static_cast<uint8_t>(((a > b) ? (a - b) : (b - a))))
This macro yields answer I need, but I'm wondering if there's something I can do to make this computation any faster?
Please note that I have the static_cast in the macro because without it I get a compiler message that:
conversion to 'uint8_t {aka unsigned char}' from 'int' may alter its value [-Werror=conversion]
As a general rule, compilers usually generate the most optimized assembly when you write your source in a way that most obviously communicates your intent. This is what you want to do:
uint8_t diff(uint8_t a, uint8_t b) {
return abs(a - b);
}
You can see how some compilers compile this on Godbolt. In particular GCC has an interesting sequence that doesn't use any branches or CMOV instructions:
movzx eax, dil
movzx esi, sil
sub eax, esi
cdq
xor eax, edx
sub eax, edx
ret
This uses the trick where we can take the absolute value of a register by sign-extending it, XORing the upper half with the lower half, and subtracting the upper half from the lower half. Meanwhile, Clang uses a CMOV instruction. I have no idea which one performs better in practice. You may need to benchmark them on the specific architecture you're targeting.
As this is a common operation, I doubt that there's any assembly you can write by hand that would be faster than both of these approaches.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With