On g++ 4.9.2 and 5.3.1, this code takes several seconds to compile and produces a 52,776 byte executable:
#include <array> #include <iostream>  int main() {     constexpr std::size_t size = 4096;      struct S     {         float f;         S() : f(0.0f) {}     };      std::array<S, size> a = {};  // <-- note aggregate initialization      for (auto& e : a)         std::cerr << e.f;      return 0; }   Increasing size seems to increase compilation time and executable size linearly. I cannot reproduce this behaviour with either clang 3.5 or Visual C++ 2015. Using -Os makes no difference.
$ time g++ -O2 -std=c++11 test.cpp real    0m4.178s user    0m4.060s sys     0m0.068s   Inspecting the assembly code reveals that the initialization of a is unrolled, generating 4096 movl instructions:
main: .LFB1313:     .cfi_startproc     pushq   %rbx     .cfi_def_cfa_offset 16     .cfi_offset 3, -16     subq    $16384, %rsp     .cfi_def_cfa_offset 16400     movl    $0x00000000, (%rsp)     movl    $0x00000000, 4(%rsp)     movq    %rsp, %rbx     movl    $0x00000000, 8(%rsp)     movl    $0x00000000, 12(%rsp)     movl    $0x00000000, 16(%rsp)        [...skipping 4000 lines...]     movl    $0x00000000, 16376(%rsp)     movl    $0x00000000, 16380(%rsp)   This only happens when T has a non-trivial constructor and the array is initialized using {}. If I do any of the following, g++ generates a simple loop:
S::S();S::S() and initialize S::f in-class;= {});-O2.I'm all for loop unrolling as an optimization, but I don't think this is a very good one. Before I report this as a bug, can someone confirm whether this is the expected behaviour?
[edit: I've opened a new bug for this because the others don't seem to match. They were more about long compilation time than weird codegen.]
There appears to be a related bug report, Bug 59659 - large zero-initialized std::array compile time excessive. It was considered "fixed" for 4.9.0, so I consider this testcase either a regression or an edgecase not covered by the patch. For what it's worth, two of the bug report's test cases1, 2 exhibit symptoms for me on both GCC 4.9.0 as well as 5.3.1
There are two more related bug reports:
Bug 68203 - Аbout infinite compilation time on struct with nested array of pairs with -std=c++11
Andrew Pinski 2015-11-04 07:56:57 UTC
This is most likely a memory hog which is generating lots of default constructors rather than a loop over them.
That one claims to be a duplicate of this one:
Bug 56671 - Gcc uses large amounts of memory and processor power with large C++11 bitsets
Jonathan Wakely 2016-01-26 15:12:27 UTC
Generating the array initialization for this constexpr constructor is the problem:
constexpr _Base_bitset(unsigned long long __val) noexcept : _M_w{ _WordT(__val) } { }
Indeed if we change it to S a[4096] {}; we don't get the problem.
Using perf we can see where GCC is spending most of its time. First:
perf record g++ -std=c++11 -O2 test.cpp
Then perf report:
  10.33%  cc1plus   cc1plus                 [.] get_ref_base_and_extent    6.36%  cc1plus   cc1plus                 [.] memrefs_conflict_p    6.25%  cc1plus   cc1plus                 [.] vn_reference_lookup_2    6.16%  cc1plus   cc1plus                 [.] exp_equiv_p    5.99%  cc1plus   cc1plus                 [.] walk_non_aliased_vuses    5.02%  cc1plus   cc1plus                 [.] find_base_term    4.98%  cc1plus   cc1plus                 [.] invalidate    4.73%  cc1plus   cc1plus                 [.] write_dependence_p    4.68%  cc1plus   cc1plus                 [.] estimate_calls_size_and_time    4.11%  cc1plus   cc1plus                 [.] ix86_find_base_term    3.41%  cc1plus   cc1plus                 [.] rtx_equal_p    2.87%  cc1plus   cc1plus                 [.] cse_insn    2.77%  cc1plus   cc1plus                 [.] record_store    2.66%  cc1plus   cc1plus                 [.] vn_reference_eq    2.48%  cc1plus   cc1plus                 [.] operand_equal_p    1.21%  cc1plus   cc1plus                 [.] integer_zerop    1.00%  cc1plus   cc1plus                 [.] base_alias_check   This won't mean much to anyone but GCC developers but it's still interesting to see what's taking up so much compilation time.
Clang 3.7.0 does a much better job at this than GCC. At -O2 it takes less than a second to compile, produces a much smaller executable (8960 bytes) and this assembly:
0000000000400810 <main>:   400810:   53                      push   rbx   400811:   48 81 ec 00 40 00 00    sub    rsp,0x4000   400818:   48 8d 3c 24             lea    rdi,[rsp]   40081c:   31 db                   xor    ebx,ebx   40081e:   31 f6                   xor    esi,esi   400820:   ba 00 40 00 00          mov    edx,0x4000   400825:   e8 56 fe ff ff          call   400680 <memset@plt>   40082a:   66 0f 1f 44 00 00       nop    WORD PTR [rax+rax*1+0x0]   400830:   f3 0f 10 04 1c          movss  xmm0,DWORD PTR [rsp+rbx*1]   400835:   f3 0f 5a c0             cvtss2sd xmm0,xmm0   400839:   bf 60 10 60 00          mov    edi,0x601060   40083e:   e8 9d fe ff ff          call   4006e0 <_ZNSo9_M_insertIdEERSoT_@plt>   400843:   48 83 c3 04             add    rbx,0x4   400847:   48 81 fb 00 40 00 00    cmp    rbx,0x4000   40084e:   75 e0                   jne    400830 <main+0x20>   400850:   31 c0                   xor    eax,eax   400852:   48 81 c4 00 40 00 00    add    rsp,0x4000   400859:   5b                      pop    rbx   40085a:   c3                      ret       40085b:   0f 1f 44 00 00          nop    DWORD PTR [rax+rax*1+0x0]   On the other hand with GCC 5.3.1, with no optimizations, it compiles very quickly but still produces a 95328 sized executable. Compiling with -O2 reduces the executable size to 53912 but compilation time takes 4 seconds. I would definitely report this to their bugzilla.
Your GCC bug 71165, then merged with 92385, has been fixed on GCC 12.
https://gcc.godbolt.org/z/eGMq16esP
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With