Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to optimize a C for loop?

I have a performance issue in a bottleneck section in my code. Basically it's a simple nested loop.

Profiling the issue reveals that the program spends a lot of time just incrementing both of the loop counters (++) and testing for termination (i/j < 8).

Watching the assembly output I see that both counters don't get registers and accessing them costs a lot of cycles. Using the "register" keyword doesn't convince the compiler to actually put them in registers. is there something which can be done to to optimize the counters access time?

Here's the assembly output. The C source is just a simple nested loop with i/j counters.

  2738  0.2479  2459  0.1707   :    1e6c:   jne    1dd1 <process_hooks+0x121>
  1041  0.0942  1120  0.0778   :    1e72:   addl   $0x1,0xffffffd4(%ebp)
  2130  0.1928  2102  0.1459   :    1e76:   cmpl   $0x8,0xffffffd4(%ebp)
  2654  0.2403  2337  0.1622   :    1e7a:   jne    1da0 <process_hooks+0xf0>
  809   0.0732   814  0.0565   :    1e80:   jmp    1ce2 <process_hooks+0x32>

As requested, here's the C code as well. Compiler is gcc btw:

for (byte_index=0; byte_index < MASK_SIZE / NBBY; byte_index++)
{
    if (check_byte(mask,byte_index))
    {
        for (bit_index=0; bit_index < NBBY; bit_index++)
        {
            condition_index = byte_index*NBBY + bit_index;
            if (check_bit(condition_mask,condition_index))
            {
                .
                .
                .
            }
        }
    }
}

Thanks

like image 930
Nir Avatar asked Jul 30 '09 07:07

Nir


People also ask

Which loop is more efficient in C?

do-while is fastest to run the first iteration as there is no checking of a condition at the start.


2 Answers

There are two possible reasons it doesn't get put in a register:

The variable needs to be kept in memory

If you are taking the address of the variable or declaring it volatile, it wont be kept in a register. It doesn't look like you are doing that, but it might happen in the ... section.

gcc is doing a bad job of register allocation.

This is quite likely. gcc seems to have a poor allocator (based on the comments of its developers). In addition, register allocation is fickle and hard to reason about. You will probably be able to tweak it to get some benefits using register allocator optimizations. If you like, you can set them for that function only.

gcc 4.4 has a new register allocator, which is supposed to be better, but also allows you to choose the allocation algorithm. That will provide extra things to tweak.

You can also try telling gcc to try harder, with the hot attribute.

Finally, you can also tweak things using gcc's --param flags. They expose internal compiler settings, so this should probably not be embarked upon lightly.

like image 122
Paul Biggar Avatar answered Oct 10 '22 16:10

Paul Biggar


When getting a performance bottle neck in a loop counter you should consider unrolling the loop.

EDIT: As always, when optimizing, make sure you benchmark and convince yourself that you get the desired result.

like image 22
Laserallan Avatar answered Oct 10 '22 17:10

Laserallan