Consider this code:
#include <array>
class C
{
std::array<char, 7> a{};
int b{};
};
C slow()
{
return {};
}
C fast()
{
C c;
return c;
}
GCC 6 through 9 produce very bloated code for slow()
:
slow():
xor eax, eax
mov DWORD PTR [rsp-25], 0
mov BYTE PTR [rsp-21], 0
mov edx, DWORD PTR [rsp-24]
mov DWORD PTR [rsp-32], 0
mov WORD PTR [rsp-28], ax
mov BYTE PTR [rsp-26], 0
mov rax, QWORD PTR [rsp-32]
ret
fast():
xor eax, eax
xor edx, edx
ret
Is there a difference in meaning between the two functions? Clang emits code like fast()
for both, while GCC 4-5 do a better job than 6-9, but not quite optimal either.
Build flags: -std=c++11 -O3
Demo: https://godbolt.org/z/rPNG9o
Submitted as a GCC bug based on the feedback here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883
Compiler specific pragma gcc provides pragma GCC as a way to control temporarily the compiler behavior. By using pragma GCC optimize("O0") , the optimization level can be set to zero, which means absolutely no optimize for gcc.
GCC has a range of optimization levels, plus individual options to enable or disable particular optimizations. The overall compiler optimization level is controlled by the command line option -On, where n is the required optimization level, as follows: -O0 . (default).
Turning on optimization flags makes the compiler attempt to improve the performance and/or code size at the expense of compilation time and possibly the ability to debug the program. The compiler performs optimization based on the knowledge it has of the program.
The two functions are equivalents: the returned object (more precisely, the result object of an hypothetical call to these functions) is initialized by initializing each member using its default member initializer.
slow
:{}
as the initializer (stmt.return)=> so all members of the result object of a call to slow
are initialized with their default member initializer dcl.init.aggr]/5.4.
fast
:C(){}
([class.default.ctor]/4)=> so all members of the result object of a call to slow
are initialized with their default member initializer [class.base.init]/9.1
The resulting assemblies of the two function are functionally equivalent. So the assembly produced by Gcc is standard compliant.
In the case of slow, the assembly is just sub-optimal. The object is returned accordingly to the SystemV x86 abi on two registers: rax and rdx (edx). First it zeroes an object conceptualy of class C on the stack at addresses [rsp-32]. It zeroes a
the padding byte between a
and b
and b
. Then it copies this initialized part of the stack on the registers. The way it zeroes the stack is just suboptimal, and whatsoever all these operations are equivalent to the 2 xor operations of fast
assembly. So that is just an obvious bug.
This is not really the complete answer but it might give a clue. As I suspected there is a subtle difference in meaning to fast
and slow
which probably sends the compiler down different paths. You can see this if you make the copy constructor private.
https://godbolt.org/z/FMIRe3
#include <array>
class C
{
std::array<char, 7> a{};
public:
C(){}
private:
C(const C & c){}
};
// Compiles
C slow()
{
return {};
}
// Does not compile
C fast()
{
C c;
return c;
}
Even with copy ellision fast
still requires the copy constructor to be there where as slow
is returning an initialization list
which explicitly constructs the return value by the caller. These may or may not end up doing the same thing but I believe the compiler has to do some crunching to determine if this is the case.
There is a detailed blog post that gives some interesting background on this topic
https://akrzemi1.wordpress.com/2018/05/16/rvalues-redefined/
However the behaviour has changed in C++17
Whereas
#include <array>
class C
{
std::array<char, 7> a{};
public:
C(){}
private:
C(const C & c){}
};
C slow()
{
return {};
}
C fast()
{
return C();
}
fast
would fail to compile under C++11 it now compiles under C++17
https://godbolt.org/z/JG2PkD
The reason is that the meaning of return C()
changes from returning a temporary to explicitly constructing the object in the frame of the caller.
So now in C++17 there is a big difference between
C fast(){
C c;
return c;
}
and
C fast(){
return C();
}
because in the second one you don't even need a copy or move constructor to be available.
https://godbolt.org/z/i2eZnf
Definitely not C++ 101
The GCC maintainers agreed this was a bug (missed optimization), and it's fixed in trunk for x86_64 (ARM may be fixed later): https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90883
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With