Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is this trivial program so large when compiled?

I created a file containing the following line:

int main() { return 0; }

After compiling this, I was surprised to find out that the binary for this simple program is 8328 bytes! What is going on here, and what in the world is the binary doing in those 8328 bytes? Surely this program can be expressed in just a few lines of assembly.

Note: I compiled this with the following line:

g++ main.cpp

My g++ version is g++ (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1

like image 375
Cory Klein Avatar asked Jul 30 '12 16:07

Cory Klein


3 Answers

There's a lot in that binary:

  • a header to make the binary self-describing (try running file on it)
  • a symbol table, which the strip tool will remove for you (or link with gcc -s)
  • the names and locations of shared libraries that you never use (five of them on my box; try the ldd and strings tools)
  • startup code that loads those libraries and sets up argc and argv, then calls main
  • shutdown code that returns main's return value to the operating system.

For comic effect, try linking that program statically, where your binary will include the functions that would normally be dynamically linked to DLLs. (however, this option will simplify deployment)

like image 102
Fred Foo Avatar answered Oct 03 '22 04:10

Fred Foo


Do a binary dump of the resulting file and check it out!

It's mostly empty space. Data in the binary are organized into pages (commonly, 4096 or 8192 bytes in size). That's so pages can be memory mapped efficiently. Typically the first page contains instructions on how to load the binary - code is at this position in the file and gets mapped to this location, same for data, etc. The second page will probably be your code, and the third page will contain symbols and debugging information. Each page is probably mostly empty.

like image 20
Keith Randall Avatar answered Oct 03 '22 03:10

Keith Randall


Don't bother.

Try to make a less trivial program and you will discover the size is not that different, until your code will start to become various hundreds of kilobytes.

Briefly: There are part of the standard library that constitute the "infrastructure" between the OS modules and the C++ semantics that manage the startup and termination of the program (all that initialize and destroy the global variables, the standard input and output etc.)

Plus: everything that maps the C++ symbols towards the memory addresses, (if you didn't require to remove it - try the -O3 -s and eliminate the -g options) so that a debugger can show the proper source code references across the execution.

Also: because of the way the memory is laid out, a binary is normally made up by chunk of fixed size. Your program may even be shorter, but at least one code segment, one data segment initializer and one shared segment (for constant values) must be present.

like image 23
Emilio Garavaglia Avatar answered Oct 03 '22 02:10

Emilio Garavaglia