Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is this no-op loop not optimized away?

The following code does some copying from one array of zeroes interpreted as floats to another one, and prints timing of this operation. As I've seen many cases where no-op loops are just optimized away by compilers, including gcc, I was waiting that at some point of changing my copy-arrays program it will stop doing the copying.

#include <iostream>
#include <cstring>
#include <sys/time.h>

static inline long double currentTime()
{
    timespec ts;
    clock_gettime(CLOCK_MONOTONIC,&ts);
    return ts.tv_sec+(long double)(ts.tv_nsec)*1e-9;
}

int main()
{
    size_t W=20000,H=10000;

    float* data1=new float[W*H];
    float* data2=new float[W*H];
    memset(data1,0,W*H*sizeof(float));
    memset(data2,0,W*H*sizeof(float));

    long double time1=currentTime();
    for(int q=0;q<16;++q) // take more time
        for(int k=0;k<W*H;++k)
            data2[k]=data1[k];
    long double time2=currentTime();

    std::cout << (time2-time1)*1e+3 << " ms\n";

    delete[] data1;
    delete[] data2;
}

I compiled this with g++ 4.8.1 command g++ main.cpp -o test -std=c++0x -O3 -lrt. This program prints 6952.17 ms for me. (I had to set ulimit -s 2000000 for it to not crash.)

I also tried changing creation of arrays with new to automatic VLAs, removing memsets, but this doesn't change g++ behavior (apart from changing timings by several times).

It seems the compiler could prove that this code won't do anything sensible, so why didn't it optimize the loop away?

like image 852
Ruslan Avatar asked Feb 24 '14 09:02

Ruslan


People also ask

How to prevent compiler from optimizing variable?

Select the compiler. Under the Optimization category change optimization to Zero. When done debugging you can uncheck Override Build Options for the file. In the latter case the volatile defined inside the function can get optimized out quite often. ...

How to avoid compiler optimization in C?

But in practice gcc will ignore the statement by dead store elimination. In order to prevent gcc optimizing it, I re-write the statement as follows: volatile int tmp; tmp = pageptr[0]; pageptr[0] = tmp; It seems the trick works, but somewhat ugly.


1 Answers

Anyway it isn't impossible (clang++ version 3.3):

clang++ main.cpp -o test -std=c++0x -O3 -lrt

The program prints 0.000367 ms for me... and looking at the assembly language:

...
callq   clock_gettime
movq    56(%rsp), %r14
movq    64(%rsp), %rbx
leaq    56(%rsp), %rsi
movl    $1, %edi
callq   clock_gettime
...

while for g++:

...
call    clock_gettime
fildq   32(%rsp)
movl    $16, %eax
fildq   40(%rsp)
fmull   .LC0(%rip)
faddp   %st, %st(1)
.p2align 4,,10
.p2align 3
.L2:
 movl    $1, %ecx
 xorl    %edx, %edx
 jmp     .L5
 .p2align 4,,10
 .p2align 3
 .L3:
 movq    %rcx, %rdx
 movq    %rsi, %rcx
 .L5:
 leaq    1(%rcx), %rsi
 movss   0(%rbp,%rdx,4), %xmm0
 movss   %xmm0, (%rbx,%rdx,4)
 cmpq    $200000001, %rsi
 jne     .L3
 subl    $1, %eax
 jne     .L2
 fstpt   16(%rsp)
 leaq    32(%rsp), %rsi
 movl    $1, %edi
 call    clock_gettime
 ...

EDIT (g++ v4.8.2 / clang++ v3.3)

SOURCE CODE - ORIGINAL VERSION (1)

...
size_t W=20000,H=10000;

float* data1=new float[W*H];
float* data2=new float[W*H];
...

SOURCE CODE - MODIFIED VERSION (2)

...
const size_t W=20000;
const size_t H=10000;

float data1[W*H];
float data2[W*H];
...

Now the case that isn't optimized is (1) + g++

like image 144
manlio Avatar answered Nov 14 '22 18:11

manlio