Given the following struct... <pre class="prettyprint"><code>#include <type_traits> struct C { long a[16]{}; long b[16]{}; C() = default; }; // For godbolt C construct() { static_assert(not std::is_trivial_v<C>); static_assert(std::is_standard_layout_v<C>); C c; return c; } </code></pre> ...gcc (version 10.2 on x86-64 Linux) with enabled optimization (at all 3 levels) produces the following assembly[1] for <code>construct</code>: <pre class="prettyprint"><code>construct(): mov r8, rdi xor eax, eax mov ecx, 32 rep stosq mov rax, r8 ret </code></pre> Once I provide empty default constructor... <pre class="prettyprint"><code>#include <type_traits> struct C { long a[16]{}; long b[16]{}; C() {} // <-- The only change }; // For godbolt C construct() { static_assert(not std::is_trivial_v<C>); static_assert(std::is_standard_layout_v<C>); C c; return c; } </code></pre> ...generated assembly changes to initializing every field individually instead of single memset in the original: <pre class="prettyprint"><code>construct(): mov rdx, rdi mov eax, 0 mov ecx, 16 rep stosq lea rdi, [rdx+128] mov ecx, 16 rep stosq mov rax, rdx ret </code></pre> Apparently, both structs are equivalent in terms of not being trivial, but being standard layout. Is it just gcc missing an optimization opportunity, or is there more to it from the C++-the-language perspective? <hr> The example is a stripped down version of production code where this did have material difference in performance. <hr> [1] Godbolt: https://godbolt.org/z/8n1Mae

While I agree that this seems like a missed optimization opportunity, I noticed one difference from the language level perspective. The implicitly-defined constructor is <code>constexpr</code> while the empty default constructor in your example is not. From cppreference.com: <blockquote> That is, [the implicitly-defined constructor] calls the default constructors of the bases and of the non-static members of this class. If this satisfies the requirements of a constexpr constructor, the generated constructor is constexpr (since C++11). </blockquote> So as the initialization of the arrays of <code>long</code> is <code>constexpr</code>, the implicitly-defined constructor is as well. However, the user-defined one is not, as it is not marked <code>constexpr</code>. We can also confirm this by trying to make the <code>construct</code> function of the example <code>constexpr</code>. For the implicitly-defined constructor this works without any problems, but for the empty user-defined version it fails to compile because <blockquote> <source>:3:8: note: 'C' is not an aggregate, does not have a trivial default constructor, and has no 'constexpr' constructor that is not a copy or move constructor </blockquote> as we can see here: https://godbolt.org/z/MnsbzKv1v So to fix this difference we can make the empty user-defined constructor <code>constexpr</code>: <pre class="prettyprint lang-cpp prettyprint-override"><code>struct C { long a[16]{}; long b[16]{}; constexpr C() {} }; </code></pre> Somewhat surprisingly, gcc now generates the optimized version, i.e. the exact same code as for the defaulted default constructor: https://godbolt.org/z/cchTnEhKW I do not know why, but this difference in <code>constexpr</code>ness actually seems to help the compiler in this case. So while it seems like gcc should be able to generate the same code without specifying <code>constexpr</code>, I guess it is good to know that it can be beneficial. <hr> As an additional test for this observation, we could try to make the implicitly-defined constructor non-<code>constexpr</code> and see if gcc fails to do the optimization. One simple way that I can think of to try to test this is to have <code>C</code> inherit from an empty class with a non-<code>constexpr</code> default constructor: <pre class="prettyprint lang-cpp prettyprint-override"><code>struct D { D() {} }; struct C : D { long a[16]{}; long b[16]{}; C() = default; }; </code></pre> And indeed, this generates the assembly that initializes the fields individually again. Once we make <code>D()</code> <code>constexpr</code>, we get the optimized code back. See https://godbolt.org/z/esYhc1cfW.

Different machine code for empty default constructor v. implicitly-defined one

Tags:

c++

optimization

gcc

x86-64

c++20

Given the following struct...

#include <type_traits>

struct C {
    long a[16]{};
    long b[16]{};

    C() = default;
};

// For godbolt
C construct() {
    static_assert(not std::is_trivial_v<C>);
    static_assert(std::is_standard_layout_v<C>);

    C c;
    return c;
}

...gcc (version 10.2 on x86-64 Linux) with enabled optimization (at all 3 levels) produces the following assembly^[1] for construct:

construct():
        mov     r8, rdi
        xor     eax, eax
        mov     ecx, 32
        rep stosq
        mov     rax, r8
        ret

Once I provide empty default constructor...

#include <type_traits>

struct C {
    long a[16]{};
    long b[16]{};

    C() {}  // <-- The only change
};

// For godbolt
C construct() {
    static_assert(not std::is_trivial_v<C>);
    static_assert(std::is_standard_layout_v<C>);

    C c;
    return c;
}

...generated assembly changes to initializing every field individually instead of single memset in the original:

construct():
        mov     rdx, rdi
        mov     eax, 0
        mov     ecx, 16
        rep stosq
        lea     rdi, [rdx+128]
        mov     ecx, 16
        rep stosq
        mov     rax, rdx
        ret

Apparently, both structs are equivalent in terms of not being trivial, but being standard layout. Is it just gcc missing an optimization opportunity, or is there more to it from the C++-the-language perspective?

The example is a stripped down version of production code where this did have material difference in performance.

[1] Godbolt: https://godbolt.org/z/8n1Mae

510

asked Jan 24 '21 13:01

Ilya Kurnosov

1 Answers

While I agree that this seems like a missed optimization opportunity, I noticed one difference from the language level perspective. The implicitly-defined constructor is constexpr while the empty default constructor in your example is not. From cppreference.com:

That is, [the implicitly-defined constructor] calls the default constructors of the bases and of the non-static members of this class. If this satisfies the requirements of a constexpr constructor, the generated constructor is constexpr (since C++11).

So as the initialization of the arrays of long is constexpr, the implicitly-defined constructor is as well. However, the user-defined one is not, as it is not marked constexpr. We can also confirm this by trying to make the construct function of the example constexpr. For the implicitly-defined constructor this works without any problems, but for the empty user-defined version it fails to compile because

<source>:3:8: note: 'C' is not an aggregate, does not have a trivial default constructor, and has no 'constexpr' constructor that is not a copy or move constructor

as we can see here: https://godbolt.org/z/MnsbzKv1v

So to fix this difference we can make the empty user-defined constructor constexpr:

struct C {
    long a[16]{};
    long b[16]{};

    constexpr C() {}
};

Somewhat surprisingly, gcc now generates the optimized version, i.e. the exact same code as for the defaulted default constructor: https://godbolt.org/z/cchTnEhKW

I do not know why, but this difference in constexprness actually seems to help the compiler in this case. So while it seems like gcc should be able to generate the same code without specifying constexpr, I guess it is good to know that it can be beneficial.

As an additional test for this observation, we could try to make the implicitly-defined constructor non-constexpr and see if gcc fails to do the optimization. One simple way that I can think of to try to test this is to have C inherit from an empty class with a non-constexpr default constructor:

struct D {
    D() {}
};

struct C : D {
    long a[16]{};
    long b[16]{};

    C() = default;
};

And indeed, this generates the assembly that initializes the fields individually again. Once we make D() constexpr, we get the optimized code back. See https://godbolt.org/z/esYhc1cfW.

answered Oct 20 '22 01:10

mjacobse

Related questions
                            
                                What does the expression "b=(b-x)&x" mean?
                            
                                Overload resolution between constructors and conversion operators
                            
                                GCC throws init-list-lifetime warning on potentially valid code?
                            
                                How to fix Missing CSRF token in sentry
                            
                                Should I use lock_guard, scoped_lock or unique_lock in this situation?
                            
                                Is an argument copied to the resulting temporary when both the function parameter and return types are references?
                            
                                How to split a string by emojis in C++
                            
                                C++ regex bug! Square bracket expression does not work with icase flag
                            
                                Templated conversion operator priority
                            
                                Should operator<=> synthesize array comparisons?
                            
                                1D Heat Equation using DFT produces incorrect results (FFTW)
                            
                                Having trouble with the end of this cppreference.com article
                            
                                An issue about the member function qualified by volatile qualifier
                            
                                Why does MSVC fail to compile this CRTP code?
                            
                                Why does an out-of-class member template definition need a repetition of its declaration 'requires-clause'
                            
                                Configuration-specific add_custom_command with Xcode generator
                            
                                How to understand address for array of pointers?
                            
                                Is it possible to automatically deduce the type of the pointer to member overloaded function in ternary when called after?
                            
                                How to Declar Member Function with both `requires` and `-> return_type`
                            
                                Why doesn't std::extent work on references to arrays like operator sizeof?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With