Here is what I'm trying to acheive. It's simple enough:
unsigned int foo1(bool cond, unsigned int num)
{
return cond ? num : 0;
}
Assmebly:
test dil, dil
mov eax, 0
cmovne eax, esi
ret
My question is, is there a faster way to do it? Here are some ways I thought of:
unsigned int foo2(bool cond, unsigned int num)
{
return cond * num;
}
Assmbly:
movzx eax, dil
imul eax, esi
ret
unsigned int foo3(bool cond, unsigned int num)
{
static const unsigned int masks[2] = { 0x0, 0xFFFFFFFF };
return masks[cond] & num;
}
Assembly:
movzx edi, dil
mov eax, DWORD PTR foo3(bool, unsigned int)::masks[0+rdi*4]
and eax, esi
ret
unsigned int foo4(bool cond, unsigned int num)
{
return (0 - (unsigned)cond) & num;
}
Assembly:
movzx eax, dil
neg eax
and eax, esi
ret
Now, multiplication yields the least instructions, I think it's the best choice, but I'm not sure about the imul. Any suggestions?
Thanks in advance,
A ternary operator is a single statement, while an if-else is a block of code. A ternary operator is faster than an if-else block.
The alternative to the ternary operation is to use the && (AND) operation. Because the AND operator will short-circuit if the left-operand is falsey, it acts identically to the first part of the ternary operator. This means that we can easily extend a statement with one conditional concern to two concerns.
It is not faster. There is one difference when you can initialize a constant variable depending on some expression: const int x = (a<b) ?
Moreover, as has been pointed out, at the byte code level there's really no difference between the ternary operator and if-then-else. As in the above example, the decision on which to choose is based wholly on readability.
Multiplications and memory accesses take frequently more time than a simple if statement. If you want to optimize this code, the best way would be to use only "and" or "or" instructions (set it as inline to avoid a function call by the way).
Here is an 'optimized' example of your function using masks instead of booleans :
inline unsigned int foo1(unsigned int mask, unsigned int num)
{
return mask & num;
}
Your call would look like this :
foo1(0, 10); /* Returns 0 */
foo1(~0, 10); /* Returns 10 */
Optimizing code isn't always as easy as counting assembler instructions and CPU ticks.
The multiplication method is likely the fastest on most systems, since it removes a branch. The multiplication instruction should be reasonably fast on most CPU cores.
What you could consider though, is if you really need to use such large integer types. On small 8 or 16 bit CPUs, the following code would be significantly faster:
uint_fast16_t foo2(bool cond, uint_fast16_t num)
{
return (uint_fast16_t)cond * num;
}
On the other hand, such CPUs rarely come with branch prediction or instruction cache.
You shouldn't need to worry about manual function inlining. The compiler will inline this function automatically on most compilers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With