Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make C++ linking take less memory

I am working on a university research project in C++ with a lot of templates which have further nested templates and so on. The project is about efficient index data structures for a specific area of research. You can imagine: An index structure has a lot of parameters to adjust, so we use the template parameters excessively. Of course, we want to test our indexes with different sets of parameters, so there are quite a lot of template instantiations.

The project is not that huge. Maybe 50k LOC. But still, linking take 50 seconds and consumes over 7 GB memory (!!!). I am on a 32GB workstation so everything is fine for me. I often have bachelor and master students working on this project. The problem is that they often work on Laptops with 4 or 8 GB of RAM. Thus, these students have great troubles compiling that project. The resulting test binary (i.e. a binary that simply contains unit tests for the index structures) is 700 megabytes. Most of that are symbols, because the nested templates produce huge names. If I use strip on the binary, it drops down to 8 megabytes.

So is there a way to reduce the RAM usage during linkage? And is there a way to have smaller symbols even with nested templates?

We compile using g++4.9 with std=c++11 under Ubuntu 14.10.

Edit:

It really seems to be the nested templates. We have two test cases with really deeply nested templates. The two .o files for these test make up almost 90% of the memory of the final binary. They result in method names that are over 3000 characters long. There is no way to not use the nested templates here as they form a "processing tree" of an example query. Is there any way to keep names short when using deeply nested templates?

like image 678
gexicide Avatar asked Sep 25 '14 10:09

gexicide


People also ask

How do I reduce memory usage in Python?

Use join() instead of '+' to concatenate string The longer the string, the more memory consumed, the less efficient the code becomes. Using join() can improve speed >30% vs '+' operator.


3 Answers

So is there a way to reduce the RAM usage during linkage? And is there a way to have smaller symbols even with nested templates?

Have you considered using the pimpl idiom in client code?

Consider a situation where you have this include chain:

A.h -> B.h -> C.h -> D.h (C includes D, B includes C, etc)

Suppose that A.h defines class AA, B.h defines class B and so on (with AA implemented in terms of BB, BB implemented in terms of CC and so on).

If DD is a large template and used in the implementation of CC, the templated code will be compiled three times, for the compilation units A, B and C.

Now, consider what happens if instead of C.h including D.h, you have the following situation:

C.h foward declares a CCImpl *pimpl and forwards all it's methods to pImpl-> methods (and does not include D.h).

C.cpp includes C.h and D.h, and implements CCImpl and CC.

Now, D will be included once (and compiled once, for C.cpp). A and B will only include C.h, with a CImpl forward declaration. A.h, B.h and C.h no longer know that a template exists.

like image 114
utnapistim Avatar answered Sep 22 '22 13:09

utnapistim


GCC has a garbage collection scheme for the RAM it uses.

The parameters ggc-min-expand and ggc-min-heapsize are used to determine when GCC should clean up and dealloc it's unused memory (their defaults are percentages of the total system memory).

You could try something like:

g++ --param ggc-min-expand=0 --param ggc-min-heapsize=8192

From the GCC manual:

ggc-min-expand

GCC uses a garbage collector to manage its own memory allocation. This parameter specifies the minimum percentage by which the garbage collector's heap should be allowed to expand between collections. Tuning this may improve compilation speed; it has no effect on code generation.

The default is 30% + 70% * (RAM/1GB) with an upper bound of 100% when RAM >= 1GB. If getrlimit is available, the notion of "RAM" is the smallest of actual RAM, RLIMIT_RSS, RLIMIT_DATA and RLIMIT_AS. If GCC is not able to calculate RAM on a particular platform, the lower bound of 30% is used. Setting this parameter and ggc-min-heapsize to zero causes a full collection to occur at every opportunity. This is extremely slow, but can be useful for debugging.

ggc-min-heapsize

Minimum size of the garbage collector's heap before it begins bothering to collect garbage. The first collection occurs after the heap expands by ggc-min-expand% beyond ggc-min-heapsize. Again, tuning this may improve compilation speed, and has no effect on code generation. The default is RAM/8, with a lower bound of 4096 (four megabytes) and an upper bound of 131072 (128 megabytes). If getrlimit is available, the notion of "RAM" is the smallest of actual RAM, RLIMIT_RSS, RLIMIT_DATA and RLIMIT_AS. If GCC is not able to calculate RAM on a particular platform, the lower bound is used. Setting this parameter very large effectively disables garbage collection. Setting this parameter and ggc-min-expand to zero causes a full collection to occur at every opportunity.

Further details:

  • http://hostingfu.com/article/compiling-with-gcc-on-low-memory-vps
like image 43
manlio Avatar answered Sep 21 '22 13:09

manlio


Use inheritance judiciously. You may have a class Foo<1,4,8,1,9,int, std::string> as a base class for class Bar, and then the object file will mention just Bar.

Note that typedef does not introduce names for linking purposes.

[edit] And to address a performance concern from another comment, an empty derived class adds no overhead on common compilers at normal optimization levels (and often there's no overhead even in debug builds)

like image 39
MSalters Avatar answered Sep 20 '22 13:09

MSalters