Why is this no-op loop not optimized away?

Tags:

The following code does some copying from one array of zeroes interpreted as floats to another one, and prints timing of this operation. As I've seen many cases where no-op loops are just optimized away by compilers, including gcc, I was waiting that at some point of changing my copy-arrays program it will stop doing the copying.

#include <iostream>
#include <cstring>
#include <sys/time.h>

static inline long double currentTime()
{
    timespec ts;
    clock_gettime(CLOCK_MONOTONIC,&ts);
    return ts.tv_sec+(long double)(ts.tv_nsec)*1e-9;
}

int main()
{
    size_t W=20000,H=10000;

    float* data1=new float[W*H];
    float* data2=new float[W*H];
    memset(data1,0,W*H*sizeof(float));
    memset(data2,0,W*H*sizeof(float));

    long double time1=currentTime();
    for(int q=0;q<16;++q) // take more time
        for(int k=0;k<W*H;++k)
            data2[k]=data1[k];
    long double time2=currentTime();

    std::cout << (time2-time1)*1e+3 << " ms\n";

    delete[] data1;
    delete[] data2;
}

I compiled this with g++ 4.8.1 command g++ main.cpp -o test -std=c++0x -O3 -lrt. This program prints 6952.17 ms for me. (I had to set ulimit -s 2000000 for it to not crash.)

I also tried changing creation of arrays with new to automatic VLAs, removing memsets, but this doesn't change g++ behavior (apart from changing timings by several times).

It seems the compiler could prove that this code won't do anything sensible, so why didn't it optimize the loop away?

852

asked Feb 24 '14 09:02

Ruslan

1 Answers

Anyway it isn't impossible (clang++ version 3.3):

clang++ main.cpp -o test -std=c++0x -O3 -lrt

The program prints 0.000367 ms for me... and looking at the assembly language:

...
callq   clock_gettime
movq    56(%rsp), %r14
movq    64(%rsp), %rbx
leaq    56(%rsp), %rsi
movl    $1, %edi
callq   clock_gettime
...

while for g++:

...
call    clock_gettime
fildq   32(%rsp)
movl    $16, %eax
fildq   40(%rsp)
fmull   .LC0(%rip)
faddp   %st, %st(1)
.p2align 4,,10
.p2align 3
.L2:
 movl    $1, %ecx
 xorl    %edx, %edx
 jmp     .L5
 .p2align 4,,10
 .p2align 3
 .L3:
 movq    %rcx, %rdx
 movq    %rsi, %rcx
 .L5:
 leaq    1(%rcx), %rsi
 movss   0(%rbp,%rdx,4), %xmm0
 movss   %xmm0, (%rbx,%rdx,4)
 cmpq    $200000001, %rsi
 jne     .L3
 subl    $1, %eax
 jne     .L2
 fstpt   16(%rsp)
 leaq    32(%rsp), %rsi
 movl    $1, %edi
 call    clock_gettime
 ...

EDIT (g++ v4.8.2 / clang++ v3.3)

SOURCE CODE - ORIGINAL VERSION (1)

...
size_t W=20000,H=10000;

float* data1=new float[W*H];
float* data2=new float[W*H];
...

SOURCE CODE - MODIFIED VERSION (2)

...
const size_t W=20000;
const size_t H=10000;

float data1[W*H];
float data2[W*H];
...

Now the case that isn't optimized is (1) + g++

144

answered Nov 14 '22 18:11

manlio

Related questions
                            
                                Function Declared But Not Defined? Yet It Is Defined
                            
                                How to display a fixed number of digits in C++ without rounding
                            
                                Reading emails from gmail POP3 account using libCurl
                            
                                does not have field named [duplicate]
                            
                                Typedef works for structs but not enums, only in C++
                            
                                C++: is a class with virtual base but without virtual functions polymorphic and has VTable?
                            
                                Cmake on Windows doesn't add shared library paths (works on linux)
                            
                                template template parameters with container and default allocator: can I make my declaration more compact?
                            
                                Error running boost bcp tool: "The Boost path appears to have been incorrectly set"
                            
                                Speeding up writing images into hard disk in OpenCV
                            
                                C++ - Using istream_iterator with wstringstream
                            
                                Is there such event as "Closing console" in C++?
                            
                                Locate the path of STL headers used by g++
                            
                                load svg with Cairo
                            
                                Reordering test condition in for-loop: compiler bug?
                            
                                In Qt Qml Controls, ApplicationWindow lacks the native-looking theme when run
                            
                                Partial Specialization of Alias Templates
                            
                                How do I link different versions of the same library in g++?
                            
                                C++14: can you call new in a constexpr?
                            
                                How to get *my* ip from udp endpoint

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is this no-op loop not optimized away?

Tags:

c++

optimization

gcc

Ruslan

People also ask

1 Answers

manlio

Recent Activity

Donate For Us