Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

std::array with aggregate initialization on g++ generates huge code

Tags:

On g++ 4.9.2 and 5.3.1, this code takes several seconds to compile and produces a 52,776 byte executable:

#include <array> #include <iostream>  int main() {     constexpr std::size_t size = 4096;      struct S     {         float f;         S() : f(0.0f) {}     };      std::array<S, size> a = {};  // <-- note aggregate initialization      for (auto& e : a)         std::cerr << e.f;      return 0; } 

Increasing size seems to increase compilation time and executable size linearly. I cannot reproduce this behaviour with either clang 3.5 or Visual C++ 2015. Using -Os makes no difference.

$ time g++ -O2 -std=c++11 test.cpp real    0m4.178s user    0m4.060s sys     0m0.068s 

Inspecting the assembly code reveals that the initialization of a is unrolled, generating 4096 movl instructions:

main: .LFB1313:     .cfi_startproc     pushq   %rbx     .cfi_def_cfa_offset 16     .cfi_offset 3, -16     subq    $16384, %rsp     .cfi_def_cfa_offset 16400     movl    $0x00000000, (%rsp)     movl    $0x00000000, 4(%rsp)     movq    %rsp, %rbx     movl    $0x00000000, 8(%rsp)     movl    $0x00000000, 12(%rsp)     movl    $0x00000000, 16(%rsp)        [...skipping 4000 lines...]     movl    $0x00000000, 16376(%rsp)     movl    $0x00000000, 16380(%rsp) 

This only happens when T has a non-trivial constructor and the array is initialized using {}. If I do any of the following, g++ generates a simple loop:

  1. Remove S::S();
  2. Remove S::S() and initialize S::f in-class;
  3. Remove the aggregate initialization (= {});
  4. Compile without -O2.

I'm all for loop unrolling as an optimization, but I don't think this is a very good one. Before I report this as a bug, can someone confirm whether this is the expected behaviour?

[edit: I've opened a new bug for this because the others don't seem to match. They were more about long compilation time than weird codegen.]

like image 604
isanae Avatar asked May 16 '16 17:05

isanae


2 Answers

There appears to be a related bug report, Bug 59659 - large zero-initialized std::array compile time excessive. It was considered "fixed" for 4.9.0, so I consider this testcase either a regression or an edgecase not covered by the patch. For what it's worth, two of the bug report's test cases1, 2 exhibit symptoms for me on both GCC 4.9.0 as well as 5.3.1

There are two more related bug reports:

Bug 68203 - Аbout infinite compilation time on struct with nested array of pairs with -std=c++11

Andrew Pinski 2015-11-04 07:56:57 UTC

This is most likely a memory hog which is generating lots of default constructors rather than a loop over them.

That one claims to be a duplicate of this one:

Bug 56671 - Gcc uses large amounts of memory and processor power with large C++11 bitsets

Jonathan Wakely 2016-01-26 15:12:27 UTC

Generating the array initialization for this constexpr constructor is the problem:

  constexpr _Base_bitset(unsigned long long __val) noexcept   : _M_w{ _WordT(__val)    } { } 

Indeed if we change it to S a[4096] {}; we don't get the problem.


Using perf we can see where GCC is spending most of its time. First:

perf record g++ -std=c++11 -O2 test.cpp

Then perf report:

  10.33%  cc1plus   cc1plus                 [.] get_ref_base_and_extent    6.36%  cc1plus   cc1plus                 [.] memrefs_conflict_p    6.25%  cc1plus   cc1plus                 [.] vn_reference_lookup_2    6.16%  cc1plus   cc1plus                 [.] exp_equiv_p    5.99%  cc1plus   cc1plus                 [.] walk_non_aliased_vuses    5.02%  cc1plus   cc1plus                 [.] find_base_term    4.98%  cc1plus   cc1plus                 [.] invalidate    4.73%  cc1plus   cc1plus                 [.] write_dependence_p    4.68%  cc1plus   cc1plus                 [.] estimate_calls_size_and_time    4.11%  cc1plus   cc1plus                 [.] ix86_find_base_term    3.41%  cc1plus   cc1plus                 [.] rtx_equal_p    2.87%  cc1plus   cc1plus                 [.] cse_insn    2.77%  cc1plus   cc1plus                 [.] record_store    2.66%  cc1plus   cc1plus                 [.] vn_reference_eq    2.48%  cc1plus   cc1plus                 [.] operand_equal_p    1.21%  cc1plus   cc1plus                 [.] integer_zerop    1.00%  cc1plus   cc1plus                 [.] base_alias_check 

This won't mean much to anyone but GCC developers but it's still interesting to see what's taking up so much compilation time.


Clang 3.7.0 does a much better job at this than GCC. At -O2 it takes less than a second to compile, produces a much smaller executable (8960 bytes) and this assembly:

0000000000400810 <main>:   400810:   53                      push   rbx   400811:   48 81 ec 00 40 00 00    sub    rsp,0x4000   400818:   48 8d 3c 24             lea    rdi,[rsp]   40081c:   31 db                   xor    ebx,ebx   40081e:   31 f6                   xor    esi,esi   400820:   ba 00 40 00 00          mov    edx,0x4000   400825:   e8 56 fe ff ff          call   400680 <memset@plt>   40082a:   66 0f 1f 44 00 00       nop    WORD PTR [rax+rax*1+0x0]   400830:   f3 0f 10 04 1c          movss  xmm0,DWORD PTR [rsp+rbx*1]   400835:   f3 0f 5a c0             cvtss2sd xmm0,xmm0   400839:   bf 60 10 60 00          mov    edi,0x601060   40083e:   e8 9d fe ff ff          call   4006e0 <_ZNSo9_M_insertIdEERSoT_@plt>   400843:   48 83 c3 04             add    rbx,0x4   400847:   48 81 fb 00 40 00 00    cmp    rbx,0x4000   40084e:   75 e0                   jne    400830 <main+0x20>   400850:   31 c0                   xor    eax,eax   400852:   48 81 c4 00 40 00 00    add    rsp,0x4000   400859:   5b                      pop    rbx   40085a:   c3                      ret       40085b:   0f 1f 44 00 00          nop    DWORD PTR [rax+rax*1+0x0] 

On the other hand with GCC 5.3.1, with no optimizations, it compiles very quickly but still produces a 95328 sized executable. Compiling with -O2 reduces the executable size to 53912 but compilation time takes 4 seconds. I would definitely report this to their bugzilla.

like image 108
user6342117 Avatar answered Oct 05 '22 23:10

user6342117


Your GCC bug 71165, then merged with 92385, has been fixed on GCC 12.

https://gcc.godbolt.org/z/eGMq16esP

like image 36
Giovanni Cerretani Avatar answered Oct 05 '22 22:10

Giovanni Cerretani