I've got a very simple c program which copies all elements from array A to back to array A. For example,
double *A;
A = (double*)malloc(sizeof(double)*SIZE);
for( i = 0; i < SIZE; i++) {
A[i] = A[i];
}
I was expecting this to be optimised out by the compiler and eventually turned into a noop. However, by measuring the runtime of this loop and looking at the assembly code, it seems that the element is indeed loaded from memory into register and then stored back to the same memory location. I have -O3 enabled. Can anyone explain to me why the c compiler does not optimise it? Or am I missing something here?
Many thanks.
From a hardware perspective, loading and saving a double is not a no-op; its bitwise value can change if it is one of several trap representations of an IEEE double.
For example, if you load a NaN into a register, it will be written out as the canonical NaN value, which may not be the same bitwise value.
my gcc (version 4.6.1) optimizes it out
$ cat 7680489.c
#include <stdlib.h> #define SIZE 100 int main(void) { double *a; size_t i; a = calloc(SIZE, sizeof *a); /* initialize elements */ for (i = 0; i < SIZE; i++) a[i] = a[i]; free(a); return 0; }
$ gcc -std=c89 -O3 -S 7680489.c
$ cat 7680489.s
.file "7680489.c" .section .text.startup,"ax",@progbits .p2align 4,,15 .globl main .type main, @function main: .LFB3: .cfi_startproc subq $8, %rsp .cfi_def_cfa_offset 16 movl $8, %esi movl $100, %edi call calloc movq %rax, %rdi call free xorl %eax, %eax addq $8, %rsp .cfi_def_cfa_offset 8 ret .cfi_endproc .LFE3: .size main, .-main .ident "GCC: (Debian 4.6.1-4) 4.6.1" .section .note.GNU-stack,"",@progbits
No loop that I can see. The assembly output is very similar when using malloc
rather than calloc
. I switched to calloc
to avoid having objects with indeterminate values about (thanks R..).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With