On g++ 4.9.2 and 5.3.1, this code takes several seconds to compile and produces a 52,776 byte executable:
#include <array> #include <iostream> int main() { constexpr std::size_t size = 4096; struct S { float f; S() : f(0.0f) {} }; std::array<S, size> a = {}; // <-- note aggregate initialization for (auto& e : a) std::cerr << e.f; return 0; }
Increasing size
seems to increase compilation time and executable size linearly. I cannot reproduce this behaviour with either clang 3.5 or Visual C++ 2015. Using -Os
makes no difference.
$ time g++ -O2 -std=c++11 test.cpp real 0m4.178s user 0m4.060s sys 0m0.068s
Inspecting the assembly code reveals that the initialization of a
is unrolled, generating 4096 movl
instructions:
main: .LFB1313: .cfi_startproc pushq %rbx .cfi_def_cfa_offset 16 .cfi_offset 3, -16 subq $16384, %rsp .cfi_def_cfa_offset 16400 movl $0x00000000, (%rsp) movl $0x00000000, 4(%rsp) movq %rsp, %rbx movl $0x00000000, 8(%rsp) movl $0x00000000, 12(%rsp) movl $0x00000000, 16(%rsp) [...skipping 4000 lines...] movl $0x00000000, 16376(%rsp) movl $0x00000000, 16380(%rsp)
This only happens when T
has a non-trivial constructor and the array is initialized using {}
. If I do any of the following, g++ generates a simple loop:
S::S()
;S::S()
and initialize S::f
in-class;= {}
);-O2
.I'm all for loop unrolling as an optimization, but I don't think this is a very good one. Before I report this as a bug, can someone confirm whether this is the expected behaviour?
[edit: I've opened a new bug for this because the others don't seem to match. They were more about long compilation time than weird codegen.]
There appears to be a related bug report, Bug 59659 - large zero-initialized std::array compile time excessive. It was considered "fixed" for 4.9.0, so I consider this testcase either a regression or an edgecase not covered by the patch. For what it's worth, two of the bug report's test cases1, 2 exhibit symptoms for me on both GCC 4.9.0 as well as 5.3.1
There are two more related bug reports:
Bug 68203 - Аbout infinite compilation time on struct with nested array of pairs with -std=c++11
Andrew Pinski 2015-11-04 07:56:57 UTC
This is most likely a memory hog which is generating lots of default constructors rather than a loop over them.
That one claims to be a duplicate of this one:
Bug 56671 - Gcc uses large amounts of memory and processor power with large C++11 bitsets
Jonathan Wakely 2016-01-26 15:12:27 UTC
Generating the array initialization for this constexpr constructor is the problem:
constexpr _Base_bitset(unsigned long long __val) noexcept : _M_w{ _WordT(__val) } { }
Indeed if we change it to S a[4096] {};
we don't get the problem.
Using perf
we can see where GCC is spending most of its time. First:
perf record g++ -std=c++11 -O2 test.cpp
Then perf report
:
10.33% cc1plus cc1plus [.] get_ref_base_and_extent 6.36% cc1plus cc1plus [.] memrefs_conflict_p 6.25% cc1plus cc1plus [.] vn_reference_lookup_2 6.16% cc1plus cc1plus [.] exp_equiv_p 5.99% cc1plus cc1plus [.] walk_non_aliased_vuses 5.02% cc1plus cc1plus [.] find_base_term 4.98% cc1plus cc1plus [.] invalidate 4.73% cc1plus cc1plus [.] write_dependence_p 4.68% cc1plus cc1plus [.] estimate_calls_size_and_time 4.11% cc1plus cc1plus [.] ix86_find_base_term 3.41% cc1plus cc1plus [.] rtx_equal_p 2.87% cc1plus cc1plus [.] cse_insn 2.77% cc1plus cc1plus [.] record_store 2.66% cc1plus cc1plus [.] vn_reference_eq 2.48% cc1plus cc1plus [.] operand_equal_p 1.21% cc1plus cc1plus [.] integer_zerop 1.00% cc1plus cc1plus [.] base_alias_check
This won't mean much to anyone but GCC developers but it's still interesting to see what's taking up so much compilation time.
Clang 3.7.0 does a much better job at this than GCC. At -O2
it takes less than a second to compile, produces a much smaller executable (8960 bytes) and this assembly:
0000000000400810 <main>: 400810: 53 push rbx 400811: 48 81 ec 00 40 00 00 sub rsp,0x4000 400818: 48 8d 3c 24 lea rdi,[rsp] 40081c: 31 db xor ebx,ebx 40081e: 31 f6 xor esi,esi 400820: ba 00 40 00 00 mov edx,0x4000 400825: e8 56 fe ff ff call 400680 <memset@plt> 40082a: 66 0f 1f 44 00 00 nop WORD PTR [rax+rax*1+0x0] 400830: f3 0f 10 04 1c movss xmm0,DWORD PTR [rsp+rbx*1] 400835: f3 0f 5a c0 cvtss2sd xmm0,xmm0 400839: bf 60 10 60 00 mov edi,0x601060 40083e: e8 9d fe ff ff call 4006e0 <_ZNSo9_M_insertIdEERSoT_@plt> 400843: 48 83 c3 04 add rbx,0x4 400847: 48 81 fb 00 40 00 00 cmp rbx,0x4000 40084e: 75 e0 jne 400830 <main+0x20> 400850: 31 c0 xor eax,eax 400852: 48 81 c4 00 40 00 00 add rsp,0x4000 400859: 5b pop rbx 40085a: c3 ret 40085b: 0f 1f 44 00 00 nop DWORD PTR [rax+rax*1+0x0]
On the other hand with GCC 5.3.1, with no optimizations, it compiles very quickly but still produces a 95328 sized executable. Compiling with -O2
reduces the executable size to 53912 but compilation time takes 4 seconds. I would definitely report this to their bugzilla.
Your GCC bug 71165, then merged with 92385, has been fixed on GCC 12.
https://gcc.godbolt.org/z/eGMq16esP
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With