Why this loop is not optimised out?

Question

I've got a very simple c program which copies all elements from array A to back to array A. For example,

double *A;
A = (double*)malloc(sizeof(double)*SIZE);
for( i = 0; i < SIZE; i++) {
  A[i] = A[i];
}

I was expecting this to be optimised out by the compiler and eventually turned into a noop. However, by measuring the runtime of this loop and looking at the assembly code, it seems that the element is indeed loaded from memory into register and then stored back to the same memory location. I have -O3 enabled. Can anyone explain to me why the c compiler does not optimise it? Or am I missing something here?

Many thanks.

MSN · Accepted Answer

From a hardware perspective, loading and saving a double is not a no-op; its bitwise value can change if it is one of several trap representations of an IEEE double.

For example, if you load a NaN into a register, it will be written out as the canonical NaN value, which may not be the same bitwise value.

pmg · Answer

my gcc (version 4.6.1) optimizes it out

$ cat 7680489.c

#include <stdlib.h>

#define SIZE 100

int main(void) {
  double *a;
  size_t i;

  a = calloc(SIZE, sizeof *a); /* initialize elements */
  for (i = 0; i < SIZE; i++) a[i] = a[i];
  free(a);

  return 0;
}

$ gcc -std=c89 -O3 -S 7680489.c
$ cat 7680489.s

        .file   "7680489.c"
        .section        .text.startup,"ax",@progbits
        .p2align 4,,15
        .globl  main
        .type   main, @function
main:
.LFB3:
        .cfi_startproc
        subq    $8, %rsp
        .cfi_def_cfa_offset 16
        movl    $8, %esi
        movl    $100, %edi
        call    calloc
        movq    %rax, %rdi
        call    free
        xorl    %eax, %eax
        addq    $8, %rsp
        .cfi_def_cfa_offset 8
        ret
        .cfi_endproc
.LFE3:
        .size   main, .-main
        .ident  "GCC: (Debian 4.6.1-4) 4.6.1"
        .section        .note.GNU-stack,"",@progbits

No loop that I can see. The assembly output is very similar when using malloc rather than calloc. I switched to calloc to avoid having objects with indeterminate values about (thanks R..).

Why this loop is not optimised out?

Tags:

c

user983027

2 Answers

MSN

pmg

Recent Activity

Donate For Us

Why this loop is not optimised out?

Tags:

c

user983027

2 Answers

MSN

pmg

Related questions

Recent Activity

Donate For Us