Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does source code amalgamation really increase the performances of a C or C++ program? [closed]

Code amalgamation consists in copying the whole source code in one single file.

For instance, it is done by SQLite to reduce the compile time and increase the performances of the resulting executable. Here, it results in one file of 184K lines of code.

My question is not about compile time (already answered in this question), but about the efficiency of the executable.

SQLite developers say:

In addition to making SQLite easier to incorporate into other projects, the amalgamation also makes it run faster. Many compilers are able to do additional optimizations on code when it is contained with in a single translation unit such as it is in the amalgamation. We have measured performance improvements of between 5 and 10% when we use the amalgamation to compile SQLite rather than individual source files. The downside of this is that the additional optimizations often take the form of function inlining which tends to make the size of the resulting binary image larger.

From what I understood, this is due to interprocedural optimization (IPO), an optimization made by the compiler.

GCC developers also say this (thanks @nwp for the link):

The compiler performs optimization based on the knowledge it has of the program. Compiling multiple files at once to a single output file mode allows the compiler to use information gained from all of the files when compiling each of them.

But they do not speak about the eventual gain of this.

Are there any measurements, apart from those of SQLite, which confirm or refute the claim that IPO with amalgamation produces faster executables than IPO without amalgamation when compiled with gcc?

As a side question, is it the same thing to do code amalgamation or to #include all the .cpp (or .c) files into one single file regarding this optimization?

like image 989
Tom Cornebize Avatar asked Aug 11 '16 14:08

Tom Cornebize


1 Answers

The organization of the source-code files will not "produce a more efficient binary," and the speed of retrieving from multiple source files is negligible.

A version control system will take deltas of any file regardless of size.

Ordinarily, separate components such as these are separately compiled to produce binary libraries containing the associated object code: the source code is not recompiled each time. When an "application A" uses a "library B" that is changed, then "application A" must be re-linked but it does not have to be recompiled if the library's API has not changed.

And, in terms of the library itself, if it consists of (hundreds of) separate source-files, only the files that have been changed have to be recompiled before the library is re-linked. (Any Makefile will do this.) If the source-code were "one huge thing," you'd have to recompile all of it every time, and that could take a long time ... basically, a waste of time.

There are two ways in which the object-code from a library (once it has been built ...) can be incorporated into an executable: static linking, and dynamic. If static linking is used, the necessary parts of the library will be copied into the executable ... but, not all of it. The library-file does not have to be present when the executable is run.

If dynamic linking is used, the entire library exists in a separate file (e.g. .DLL or .so) which does have to be present at runtime but which will be shared by every application that is using it at the same time.

I recommend that you primarily view this as a source-code management issue, not as something that will confer any sort of technical or runtime advantages. (It will not.) I find it difficult to see a compelling reason to do this at all.

like image 99
Mike Robinson Avatar answered Nov 15 '22 09:11

Mike Robinson