Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does the size of a binary influence the execution speed [closed]

How does the size of a binary influence the execution speed? Specifically I am talking about code written in ANSI-C translated into machine language using the gnu or intel compiler. The target platform for the binary are modern computers with intel or AMD multi-core CPU's running a Linux operating system. The code performs numerical computations possibly in parallel using openMP and the binary could have several mega bytes.

Note that the execution time will in any case be much larger than the time needed to load code and libraries. I think of very specific codes used to solve large systems of ordinary differential equations for simulations of kinetic equations which are typically CPU-bound for a moderate system size but can also become memory-bound.

I am asking whether small binary size should be a design criterion for highly efficient code or if I can always give preference to explicit code (which eventually repeats code blocks which could be implemented as functions) and compiler optimizations such as loop unrolling etc.

I am aware of profiling technics and how I can apply them to specific problems, but I wonder to which extent general statements can be made.

like image 705
highsciguy Avatar asked Oct 17 '12 15:10

highsciguy


2 Answers

CPUs have caches.

As compared to the CPU speed, access to system memory is slow. That's why CPUs have caches (made of ultra-fast memory).

Each level of CPU cache has a different size and speed.

Therefore, to achieve the largest possible speed, it is of critical importance to avoid cache refreshes at the lowest levels (unfortunately that's also the smallest caches).

Both code and data will force a cache refresh. So size matters in both cases.

For example: Code may generate a cache miss when you jump or call. Data may generate a cache miss when you load a variable at a remote address.

There are other issues like alignment which can greatly influence the speed but nothing costs more than a CPU cache miss (reloading a CPU cache involves CPU cores synchronization, and that's not an easy task: it can take something like 250 CPU cycles!).

Without entering into platform-specific details, that's what can be said.

Conclusion: keep it simple. And small is beautiful.

like image 82
Gil Avatar answered Nov 14 '22 04:11

Gil


The CPU is only ever executing one part of the code, so it's the content of the code, and how much moving around within it you do, that determines speed.

If you have 10Mb of code, and the first 9Mb is only executed once on startup, then it doesn't matter if that 9Mb is slow, or if it's 90Mb or 90kb. If the CPU spends 99.99% of its time in some small, tight loop doing some very efficient calculations then it will be fast, if it has to run through 100,000 lines of code again and again, it will probably be much slower.

Optimisation is about seeing where the CPU spends most of its time and making that code as efficient as possible in the number of CPU cycles taken to get to the answer. Sometimes that could mean adding a load of extra "prep" code outside it to make the main part's job easier/faster.

In some systems, binary size is of major concern (EG embedded devices) but in others it's almost completely irrelevant.

See also: http://www.codeproject.com/Articles/6154/Writing-Efficient-C-and-C-Code-Optimization

like image 22
John U Avatar answered Nov 14 '22 04:11

John U