Why does the compiler not always optimize away local variables?

Tags:

I am trying to understand if the removal of local intermediate variables could lead to better optimized code. Consider the following MWE, paying particular attention to the two functions f and g:

struct A {
    double d;
};

struct B {
    double s;
};

struct C {
    A a;
    B b;
};

A geta();
B getb();

C f() {
    const A a = geta();
    const B b = getb();

    C c;
    c.a = a;
    c.b = b;
    return c;
}

C g() {
    C c;
    c.a = geta();
    c.b = getb();
    return c;
}

Both f and g call geta() and getb() to populate an instance of class C which is then returned, but f uses two local intermediate variables to store the returned values of geta() and getb(), while g directly assigns the returned values to the members of c.

Compiling with gcc -O3, version 9.2, the binaries for the two functions f and g are exactly the same. However, adding another variable to either A or B class leads to different binaries. In particular, the binary for f has some more instructions. The same holds for clang v8.0.0 with -O3 flag.

What is happening here? Why is the compiler not able to optimize away the local intermediate variables of f when A or B get a little more complex? Isn't the code of f and g equivalent?

In addition, the behavior is not the same for MSVC v19.22 with /O2 flag: the compiler from Microsoft already has different binaries in the first case, i.e. with both classes A and B composed by a single double.

I am using Godbolt: you can find here the code which produces different binaries.

287

asked Sep 12 '19 07:09

Rackbox

Video Answer

1 Answers

This is a missed optimization

Neither function takes the address of C c so escape analysis should easily prove it's a pure local that nothing else could have a pointer to. geta() and getb() can't be reading or writing that variable directly, therefore it's safe to store the geta() return value directly into c.a instead of a temporary on the stack.

Surprisingly GCC, clang, ICC, and MSVC all miss this optimization, most using call-preserved registers to hold the geta() return value until after getb(). https://godbolt.org/z/WQ9MAF At least for x86-64; I mostly didn't check other ISAs or older compiler versions.

Fun fact: clang 3.5 has this missed-optimization even for g(), defeating the source code's attempt to be efficient.

Fun fact #2: With GCC9.2, compiling as C instead of C++ makes GCC do a much worse job, deoptimizing g(). (I had to change to typedef struct Atag {...} A; but compiling that as C++ still optimizes g(). https://godbolt.org/z/_Y95nj)

clang8.0 produces an efficient g() with/without -xc. and ICC produces an inefficient g() either way.

ICC's f() is even worse than its g().

MSVC's g() is about efficient as you could hope for; the Windows x64 calling convention returns the struct by hidden pointer and MSVC never optimizes that to passing a pointer to its own return-value object. (Which it probably couldn't prove is safe anyway, if its own caller was also potentially doing such optimizations.)

Obviously if geta() and getb() can inline, that removes any doubt for the optimizer and it should do the optimization more easily / reliably.

answered Sep 28 '22 17:09

Peter Cordes

Related questions
                            
                                How to combine two or more vectors of arbitrary types in C++
                            
                                Calling methods of temporary objects created using class template argument deduction [duplicate]
                            
                                Choose assembly implementation to use based on supported instructions
                            
                                Move constructor behaviour
                            
                                SFINAE-based detection using void_t and protected nest classes
                            
                                Why do memory access times increase when far over CPU cache sizes
                            
                                Is assignment of std::array from braced list of values allowed in c++?
                            
                                Lazy-evaluate dependent-types (CRTP) [duplicate]
                            
                                How to disable the cursor in QTextEdit?
                            
                                C/C++ Plugin for Intellij-Idea Community edition
                            
                                Why does passing a temporary object as an argument need std::move?
                            
                                Constexpr pointer to data member conversion
                            
                                How to read files in a directory on Linux?
                            
                                C++ Unexpected Integer Promotion
                            
                                Clang without GCC or MSVC
                            
                                Ambiguity between default-initialization and value-initialization
                            
                                Calling base class method in derived class without specifying base class name
                            
                                How can I access output to stdout from a UWP console application in Windows 10?
                            
                                Are operators faster than functions?
                            
                                VSCode not recognizing includes from includepath

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does the compiler not always optimize away local variables?

Tags:

c++

compiler-optimization

gcc

micro-optimization

Rackbox

People also ask

Video Answer

1 Answers

Peter Cordes

Recent Activity

Donate For Us