A faster way to get a value based on condition then ternary operator?

Q: Which is faster if-else or ternary operator in Java?

Moreover, as has been pointed out, at the byte code level there's really no difference between the ternary operator and if-then-else. As in the above example, the decision on which to choose is based wholly on readability.

Tags:

c++

c

optimization

Here is what I'm trying to acheive. It's simple enough:

unsigned int foo1(bool cond, unsigned int num)
{
    return cond ? num : 0;
}

Assmebly:

    test    dil, dil
    mov     eax, 0
    cmovne  eax, esi
    ret

My question is, is there a faster way to do it? Here are some ways I thought of:

Using multiplication:

unsigned int foo2(bool cond, unsigned int num)
{
    return cond * num;
}

Assmbly:

    movzx   eax, dil
    imul    eax, esi
    ret

Using memory access:

unsigned int foo3(bool cond, unsigned int num)
{
    static const unsigned int masks[2] = { 0x0, 0xFFFFFFFF };
    return masks[cond] & num;
}

Assembly:

    movzx   edi, dil
    mov     eax, DWORD PTR foo3(bool, unsigned int)::masks[0+rdi*4]
    and     eax, esi
    ret

Using some bit tricks:

unsigned int foo4(bool cond, unsigned int num) 
{
    return (0 - (unsigned)cond) & num;
}

Assembly:

    movzx   eax, dil
    neg     eax
    and     eax, esi
    ret

Now, multiplication yields the least instructions, I think it's the best choice, but I'm not sure about the imul. Any suggestions?

Thanks in advance,

681

asked Nov 18 '16 08:11

Elad Weiss

2 Answers

Multiplications and memory accesses take frequently more time than a simple if statement. If you want to optimize this code, the best way would be to use only "and" or "or" instructions (set it as inline to avoid a function call by the way).

Here is an 'optimized' example of your function using masks instead of booleans :

inline unsigned int foo1(unsigned int mask, unsigned int num)
{
  return mask & num;
}

Your call would look like this :

foo1(0, 10);     /* Returns 0  */
foo1(~0, 10);    /* Returns 10 */

154

answered Oct 29 '22 18:10

K.Hacene

Optimizing code isn't always as easy as counting assembler instructions and CPU ticks.

The multiplication method is likely the fastest on most systems, since it removes a branch. The multiplication instruction should be reasonably fast on most CPU cores.

What you could consider though, is if you really need to use such large integer types. On small 8 or 16 bit CPUs, the following code would be significantly faster:

uint_fast16_t foo2(bool cond, uint_fast16_t num)
{
    return (uint_fast16_t)cond * num;
}

On the other hand, such CPUs rarely come with branch prediction or instruction cache.

You shouldn't need to worry about manual function inlining. The compiler will inline this function automatically on most compilers.

answered Oct 29 '22 18:10

Lundin

Related questions
                            
                                Template arguments deduction for function parameter pack followed by other parameters
                            
                                API documentation for the yaml-cpp library [closed]
                            
                                UWP and DirectX
                            
                                ANOMALY: meaningless REX prefix used
                            
                                Writing my own shell: How implement command history? [closed]
                            
                                Jenkins tests reports analyzer integration with catch
                            
                                How to implement a generic Factory that supports template covariance?
                            
                                How do I know if I'm using copy or move?
                            
                                Weird difference in execution time between two code segments
                            
                                Is libstdc++ wrong to reject assignment of volatile rvalue to std::ignore?
                            
                                Does the standard mandate enable_shared_from_this is to be inherited publicly? Why?
                            
                                Should Taking the Address of a Templatized Function Trigger its Compilation?
                            
                                Tracking down owner of a shared_ptr?
                            
                                Xcode and Code Coverage
                            
                                this pointer of a static object
                            
                                Understanding memory order relaxed in C++
                            
                                C++ container with non-copyable non-movable element type
                            
                                How to get source code of .so file in android
                            
                                Unneccessary pop instructions in functions with early if statement
                            
                                What is the syntax for partially specialising a template based on the number of parameters a template template parameter takes?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With