Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why this loop is not optimised out?

Tags:

c

I've got a very simple c program which copies all elements from array A to back to array A. For example,

double *A;
A = (double*)malloc(sizeof(double)*SIZE);
for( i = 0; i < SIZE; i++) {
  A[i] = A[i];
}

I was expecting this to be optimised out by the compiler and eventually turned into a noop. However, by measuring the runtime of this loop and looking at the assembly code, it seems that the element is indeed loaded from memory into register and then stored back to the same memory location. I have -O3 enabled. Can anyone explain to me why the c compiler does not optimise it? Or am I missing something here?

Many thanks.

like image 856
user983027 Avatar asked Oct 06 '11 21:10

user983027


2 Answers

From a hardware perspective, loading and saving a double is not a no-op; its bitwise value can change if it is one of several trap representations of an IEEE double.

For example, if you load a NaN into a register, it will be written out as the canonical NaN value, which may not be the same bitwise value.

like image 88
MSN Avatar answered Nov 04 '22 00:11

MSN


my gcc (version 4.6.1) optimizes it out

$ cat 7680489.c
#include <stdlib.h>

#define SIZE 100

int main(void) {
  double *a;
  size_t i;

  a = calloc(SIZE, sizeof *a); /* initialize elements */
  for (i = 0; i < SIZE; i++) a[i] = a[i];
  free(a);

  return 0;
}
$ gcc -std=c89 -O3 -S 7680489.c
$ cat 7680489.s
        .file   "7680489.c"
        .section        .text.startup,"ax",@progbits
        .p2align 4,,15
        .globl  main
        .type   main, @function
main:
.LFB3:
        .cfi_startproc
        subq    $8, %rsp
        .cfi_def_cfa_offset 16
        movl    $8, %esi
        movl    $100, %edi
        call    calloc
        movq    %rax, %rdi
        call    free
        xorl    %eax, %eax
        addq    $8, %rsp
        .cfi_def_cfa_offset 8
        ret
        .cfi_endproc
.LFE3:
        .size   main, .-main
        .ident  "GCC: (Debian 4.6.1-4) 4.6.1"
        .section        .note.GNU-stack,"",@progbits

No loop that I can see. The assembly output is very similar when using malloc rather than calloc. I switched to calloc to avoid having objects with indeterminate values about (thanks R..).

like image 33
pmg Avatar answered Nov 04 '22 02:11

pmg