Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compiling larger (~6MB) map initializing C++ file with gcc

Tags:

c++

gcc

g++

clang

I'm trying to compile a C++ file that is some 5.7 MB big. I'm building a 64-bit Linux executable on a 64-bit Linux system. g++ 4.7.2 is unfortunately not cooperative:

g++: internal compiler error: Killed (program cc1plus)

Observing with top indicates that the process reaches about 2.2 gigs of memory before that happens. I tried setting --param gcc-min-expand=0 and also played with --param gcc-min-heapsize but that did not resolve the problem. Disabling optimization with -O0 did not help either.

I also tried compiling with clang, but the results were similar. It segfaulted after also exceeding 2 gigs of memory. I didn't try any extra options with clang because I'm not so familiar with it.

The source file in question consists of C++11-style initialization of a few maps.

typedef std::map<std::string, int> StringToIntMap;
StringToIntMap someData = {{"SOMESTRING", 1}, ..};

What I want is preferrably to compile the file with gcc, although if clang can work instead, I can also live with it. It would also be helpful to find out, from someone who knows the internals, just what is happening behind the scenes. If I have a map of 300 000 elements where strings are about 5 bytes long, and an int corresponds to each, that's a few megabytes of data, and I can't readily imagine how the initializer blows it up to the point of requiring gigabytes to compile.

And to preempt comments that I should not have such a large source file. I know I can read the data from a data file at runtime, and that's what the program does now, but my use case is such that the program's execution time is the most important factor.

like image 850
DUman Avatar asked Oct 04 '22 00:10

DUman


1 Answers

The compiler is allowed to put implementation defined limits on the amount of supported levels/quantities in many language constructs.

Appendix B lists the minimum quantities required for a conforming compiler.

From Appendix B, bolding the most relevant ones:

The limits may constrain quantities that include those described below or others. The bracketed number following each quantity is recommended as the minimum for that quantity. However, these quantities are only guidelines and do not determine compliance.

  • Nesting levels of compound statements, iteration control structures, and selection control structures [256].
  • Nesting levels of conditional inclusion [256].
  • Pointer, array, and function declarators (in any combination) modifying a class, arithmetic, or incom- plete type in a declaration [256].
  • Nesting levels of parenthesized expressions within a full-expression [256].
  • Number of characters in an internal identifier or macro name [1 024].
  • Number of characters in an external identifier [1 024].
  • External identifiers in one translation unit [65 536].
  • Identifiers with block scope declared in one block [1 024].
  • Macro identifiers simultaneously defined in one translation unit [65 536].
  • Parameters in one function definition [256].
  • Arguments in one function call [256].
  • Parameters in one macro definition [256].
  • Arguments in one macro invocation [256].
  • Characters in one logical source line [65 536].
  • Characters in a string literal (after concatenation) [65 536].
  • Size of an object [262 144].
  • Nesting levels for #include files [256].
  • Case labels for a switch statement (excluding those for any nested switch statements) [16 384].
  • Data members in a single class [16 384].
  • Enumeration constants in a single enumeration [4 096].
  • Levels of nested class definitions in a single member-specification [256]
  • Functions registered by atexit() [32].
  • Functions registered by at_quick_exit() [32].
  • Direct and indirect base classes [16 384].
  • Direct base classes for a single class [1 024].
  • Members declared in a single class [4 096].
  • Final overriding virtual functions in a class, accessible or not [16 384].
  • Direct and indirect virtual bases of a class [1 024].
  • Static members of a class [1 024].
  • Friend declarations in a class [4 096].
  • Access control declarations in a class [4 096].
  • Member initializers in a constructor definition [6 144].
  • Scope qualifications of one identifier [256].
  • Nested external specifications [1 024].
  • Recursive constexpr function invocations [512].
  • Template arguments in a template declaration [1 024].
  • Recursively nested template instantiations, including substitution during template argument deduction (14.8.2) [1 024].
  • Handlers per try block [256].
  • Throw specifications on a single function declaration [256].
  • Number of placeholders (20.8.9.1.4) [10]

Now, initializer lists are actually just 'constructed' from a number of arguments and apparently GCC doesn't quite support the quantity/volume you provided.

There might be options in the man page to alleviate this:

  • -mlarge-data (which is the default)
  • -mlarge-text (also the default)
like image 186
sehe Avatar answered Oct 07 '22 16:10

sehe