Manually optimize a nested loop

Tags:

I'm working on a homework assignment where I must manually optimize a nested loop (my program will be compiled with optimizations disabled). The goal of the assignment is to run the entire program in less than 6 seconds (extra credit for less than 4.5 seconds).

I'm only allowed to change a small block of code, and the starting point is such:

    for (j=0; j < ARRAY_SIZE; j++) {
        sum += array[j];
    }

Where ARRAY_SIZE is 9973. This loop is contained within another loop that is run 200,000 times. This particular version runs in 16 seconds.

What I've done so far is change the implementation to unroll the loop and use pointers as my iterator:

(These declarations are not looped over 200,000 times)

 register int unroll_length = 16;
 register int *unroll_end = array + (ARRAY_SIZE - (ARRAY_SIZE % unroll_length));
 register int *end = array + (ARRAY_SIZE -1);
 register int *curr_end;

curr_end = end;

while (unroll_end != curr_end) {
 sum += *curr_end;
 curr_end--;
}

do {
 sum += *curr_end + *(curr_end-1) + *(curr_end-2) + *(curr_end-3) +
  *(curr_end-4) + *(curr_end-5) + *(curr_end-6) + *(curr_end-7) +
  *(curr_end-8) + *(curr_end-9) + *(curr_end-10) + *(curr_end-11) +
  *(curr_end-12) + *(curr_end-13) + *(curr_end-14) + *(curr_end-15);
}
while ((curr_end -=  unroll_length) != array);

sum += *curr_end;

Using these techniques, I was able to get the execution down to 5.5 seconds, which will give me full credit. However; I sure do want to earn the extra credit, but I'm also curious what additional optimizations I can make that I might be overlooking?

Edit #1 (Adding outer loop)

 srand(time(NULL));
 for(j = 0; j < ARRAY_SIZE; j++) {
  x = rand() / (int)(((unsigned)RAND_MAX + 1) / 14);
  array[j] = x;
  checksum += x;
 }

 for (i = 0; i < N_TIMES; i++) {

  // inner loop goes here

  if (sum != checksum)
   printf("Checksum error!\n");

  sum = 0;

 }

797

asked May 10 '11 15:05

JMP

1 Answers

you could try to store your variables in CPU register with :

register int *unroll_limit = array + (ARRAY_SIZE - (ARRAY_SIZE % 10));
register int *end = array + ARRAY_SIZE;
register int *curr;

and try with different size of manual loops to check when you maximize cache usage.

147

answered Sep 24 '22 23:09

Cédric Julien

Related questions
                            
                                How many GB can malloc allocate for your program
                            
                                Efficient mapping for a particular finite integer set
                            
                                About "static" in C, how is it implemented by the compiler?
                            
                                How to bind a Raw Socket to a specific port?
                            
                                libsigsegv and responding to a stack overflow
                            
                                Read register value to variable, with one asm command
                            
                                do you have to declare functions in C? [duplicate]
                            
                                Is there any libc project that does not requires linux kernel
                            
                                radix sort in c on floating points numbers
                            
                                how to use a makefile for mingw on win32
                            
                                OpenGL How to unit test?
                            
                                Load a numpy array into C from a file that was saved with numpy.save(...)
                            
                                malloc returns 0x100000000
                            
                                Does Windows have a JSON API that can be called from C?
                            
                                k&R,how getchar read EOF
                            
                                Non-GPL library that can connect to a MySQL database?
                            
                                How does compiler(gcc) know whether a function is to be dynamically linked or not?
                            
                                How can I get the children process list in kernel code
                            
                                Remove a flag from C++FLAGS in Makefile?
                            
                                Does gcc support 128-bit int on amd64? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Manually optimize a nested loop

Tags:

c

optimization

JMP

People also ask

1 Answers

Cédric Julien

Recent Activity

Donate For Us