Why does GCC fail to optimize unless the return value has a name?

Tags:

Consider this code:

#include <array>

class C
{
    std::array<char, 7> a{};
    int b{};
};

C slow()
{
    return {};
}

C fast()
{
    C c;
    return c;
}

GCC 6 through 9 produce very bloated code for slow():

slow():
        xor     eax, eax
        mov     DWORD PTR [rsp-25], 0
        mov     BYTE PTR [rsp-21], 0
        mov     edx, DWORD PTR [rsp-24]
        mov     DWORD PTR [rsp-32], 0
        mov     WORD PTR [rsp-28], ax
        mov     BYTE PTR [rsp-26], 0
        mov     rax, QWORD PTR [rsp-32]
        ret
fast():
        xor     eax, eax
        xor     edx, edx
        ret

Is there a difference in meaning between the two functions? Clang emits code like fast() for both, while GCC 4-5 do a better job than 6-9, but not quite optimal either.

Build flags: -std=c++11 -O3

Demo: https://godbolt.org/z/rPNG9o

Submitted as a GCC bug based on the feedback here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

703

asked Jun 13 '19 10:06

John Zwinck

3 Answers

The two functions are equivalents: the returned object (more precisely, the result object of an hypothetical call to these functions) is initialized by initializing each member using its default member initializer.

For `slow`:

The pr-value result of the call to slow is copy-inizialized with {} as the initializer (stmt.return)
so the resulting object is list-initialized ([dcl.init]/17.1);
which lead us to aggregate-initializion ([dcl.init.list]/3.4)

=> so all members of the result object of a call to slow are initialized with their default member initializer dcl.init.aggr]/5.4.

For `fast`:

First we assume that copy elision is performed [class.copy.elision]/1.1
so the result object of the function call is default initialized [dcl.init]/12
so the implicitly declared default constructor is called [dcl.init]/7.1.
this default constructor is equivalent to C(){} ([class.default.ctor]/4)

=> so all members of the result object of a call to slow are initialized with their default member initializer [class.base.init]/9.1

The resulting assemblies of the two function are functionally equivalent. So the assembly produced by Gcc is standard compliant.

In the case of slow, the assembly is just sub-optimal. The object is returned accordingly to the SystemV x86 abi on two registers: rax and rdx (edx). First it zeroes an object conceptualy of class C on the stack at addresses [rsp-32]. It zeroes a the padding byte between a and b and b. Then it copies this initialized part of the stack on the registers. The way it zeroes the stack is just suboptimal, and whatsoever all these operations are equivalent to the 2 xor operations of fast assembly. So that is just an obvious bug.

answered Oct 12 '22 00:10

Oliv

This is not really the complete answer but it might give a clue. As I suspected there is a subtle difference in meaning to fast and slow which probably sends the compiler down different paths. You can see this if you make the copy constructor private.

https://godbolt.org/z/FMIRe3

#include <array>

class C
{
    std::array<char, 7> a{};

    public:
    C(){}

    private:
    C(const C & c){}
};

// Compiles
C slow()
{
    return {};
}

// Does not compile
C fast()
{
    C c;
    return c;
}

Even with copy ellision fast still requires the copy constructor to be there where as slow is returning an initialization list which explicitly constructs the return value by the caller. These may or may not end up doing the same thing but I believe the compiler has to do some crunching to determine if this is the case.

There is a detailed blog post that gives some interesting background on this topic

https://akrzemi1.wordpress.com/2018/05/16/rvalues-redefined/

However the behaviour has changed in C++17

Whereas

#include <array>

class C
{
    std::array<char, 7> a{};

    public:
    C(){}

    private:
    C(const C & c){}
};

C slow()
{
    return {};
}

C fast()
{
    return C();
}

fast would fail to compile under C++11 it now compiles under C++17

https://godbolt.org/z/JG2PkD

The reason is that the meaning of return C() changes from returning a temporary to explicitly constructing the object in the frame of the caller.

So now in C++17 there is a big difference between

C fast(){
    C c;
    return c;
}

and

C fast(){
    return C();
}

because in the second one you don't even need a copy or move constructor to be available.

https://godbolt.org/z/i2eZnf

Definitely not C++ 101

answered Oct 12 '22 00:10

bradgonesurfing

The GCC maintainers agreed this was a bug (missed optimization), and it's fixed in trunk for x86_64 (ARM may be fixed later): https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883

answered Oct 11 '22 22:10

John Zwinck

Related questions
                            
                                Boost python getter/setter with the same name
                            
                                Hiding a private overloaded virtual function?
                            
                                Reading a huge file into a C++ vector in Ubuntu (linux OS)
                            
                                why allocate_shared and make_shared so slow
                            
                                Android Studio External Native Build Precompiled Headers
                            
                                Wrong CRLF in UTF-16 stream?
                            
                                Windows error reporting and out of range exceptions
                            
                                How do I return a native object from a class derived from Nan::ObjectWrap?
                            
                                Trying to understand std::forward, std::move a little better
                            
                                Can auto placeholder be used to deduce function result in non-type template parameter?
                            
                                Templated Variables Bug With Lambdas in Visual Studio?
                            
                                A type trait to detect functors using C++17?
                            
                                Atollic TrueSTUDIO: How to convert from C to C++?
                            
                                How are symbols contained in the libpythonX.X linked to numpy extension dynamic libraries?
                            
                                Is there any preference linker gives to static symbols or dynamic symbols?
                            
                                Is this a valid way of performing "Expression SFINAE" in C++03?
                            
                                v8 Locker for multi thread
                            
                                How to parallelize a plain for loop using the C++ standard library
                            
                                Advantage of gsl assert vs assert in c++?
                            
                                Does this code result in a materialized base prvalue, and should it compile?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does GCC fail to optimize unless the return value has a name?

Tags:

c++

optimization

gcc

g++

John Zwinck

People also ask

3 Answers

For `slow`:

For `fast`:

Oliv

bradgonesurfing

John Zwinck

Recent Activity

Donate For Us

Why does GCC fail to optimize unless the return value has a name?

Tags:

c++

optimization

gcc

g++

John Zwinck

People also ask

3 Answers

For slow:

For fast:

Oliv

bradgonesurfing

John Zwinck

Related questions

Recent Activity

Donate For Us

For `slow`:

For `fast`: