Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

is i=(i+1)&3 faster than i=(i+1)%4

I am optimizing a c++ code. at one critical step, I want to implement the following function y=f(x):

f(0)=1

f(1)=2

f(2)=3

f(3)=0

which one is faster ? using a lookup table or i=(i+1)&3 or i=(i+1)%4 ? or any better suggestion?

like image 211
mghandi Avatar asked Dec 05 '11 21:12

mghandi


2 Answers

Almost certainly the lookup table is going to be slowest. In a lot of cases, the compiler will generate the same assembly for (i+1)&3 and (i+1)%4; however depending on the type/signedness of i, they may not be strictly equivalent and the compiler won't be able to make that optimization. For example for the code

int foo(int i)
{
    return (i+1)%4;
}

unsigned bar(unsigned i)
{
    return (i+1)%4;
}

on my system, gcc -O2 generates:

0000000000000000 <foo>:
   0:   8d 47 01                lea    0x1(%rdi),%eax
   3:   89 c2                   mov    %eax,%edx
   5:   c1 fa 1f                sar    $0x1f,%edx
   8:   c1 ea 1e                shr    $0x1e,%edx
   b:   01 d0                   add    %edx,%eax
   d:   83 e0 03                and    $0x3,%eax
  10:   29 d0                   sub    %edx,%eax
  12:   c3                      retq   

0000000000000020 <bar>:
  20:   8d 47 01                lea    0x1(%rdi),%eax
  23:   83 e0 03                and    $0x3,%eax
  26:   c3                      retq

so as you can see because of the rules about signed modulus results, (i+1)%4 generates a lot more code in the first place.

Bottom line, you're probably best off using the (i+1)&3 version if that expresses what you want, because there's less chance for the compiler to do something you don't expect.

like image 126
Roland Avatar answered Oct 02 '22 00:10

Roland


I won't get into the discussion of premature optimization. But the answer is that they will be the same speed.

Any sane compiler will compile them to the same thing. Division/modulus by a power of two will be optimized to bitwise operations anyway.

So use whichever you find (or others will find) to be more readable.

EDIT : As Roland has pointed out, it does sometimes behave different depending on the signness:

Unsigned &:

int main(void)
{
    unsigned x;
    cin >> x;
    x = (x + 1) & 3;
    cout << x;

    return 0;
}

mov eax, DWORD PTR _x$[ebp]
inc eax
and eax, 3
push    eax

Unsigned Modulus:

int main(void)
{
    unsigned x;
    cin >> x;
    x = (x + 1) % 4;
    cout << x;

    return 0;
}

mov eax, DWORD PTR _x$[ebp]
inc eax
and eax, 3
push    eax

Signed &:

int main(void)
{
    int x;
    cin >> x;
    x = (x + 1) & 3;
    cout << x;

    return 0;
}

mov eax, DWORD PTR _x$[ebp]
inc eax
and eax, 3
push    eax

Signed Modulus:

int main(void)
{
    int x;
    cin >> x;
    x = (x + 1) % 4;
    cout << x;

    return 0;
}

mov eax, DWORD PTR _x$[ebp]
inc eax
and eax, -2147483645            ; 80000003H
jns SHORT $LN3@main
dec eax
or  eax, -4                 ; fffffffcH
like image 40
Mysticial Avatar answered Oct 01 '22 22:10

Mysticial