When a binary file runs, does it copy its entire binary data into memory at once? Could I change that?

Tags:

Does it copy the entire binary to the memory before it executes? I am interested in this question and want to change it into some other way. I mean, if the binary is 100M big (seems impossible), I could run it while I am copying it into the memory. Could that be possible?

Or could you tell me how to see the way it runs? Which tools do I need?

939

asked Dec 14 '11 15:12

bxshi

1 Answers

The theoretical model for an application-level programmer makes it appear that this is so. In point of fact, the normal startup process (at least in Linux 1.x, I believe 2.x and 3.x are optimized but similar) is:

The kernel creates a process context (more-or-less, virtual machine)
Into that process context, it defines a virtual memory mapping that maps from RAM addresses to the start of your executable file
Assuming that you're dynamically linked (the default/usual), the ld.so program (e.g. /lib/ld-linux.so.2) defined in your program's headers sets up memory mapping for shared libraries
The kernel does a jmp into the startup routine of your program (for a C program, that's something like crtprec80, which calls main). Since it has only set up the mapping, and not actually loaded any pages(*), this causes a Page Fault from the CPU's Memory Management Unit, which is an interrupt (exception, signal) to the kernel.
The kernel's Page Fault handler loads some section of your program, including the part that caused the page fault, into RAM.
As your program runs, if it accesses a virtual address that doesn't have RAM backing it up right now, Page Faults will occur and cause the kernel to suspend the program briefly, load the page from disc, and then return control to the program. This all happens "between instructions" and is normally undetectable.
As you use malloc/new, the kernel creates read-write pages of RAM (without disc backing files) and adds them to your virtual address space.
If you throw a Page Fault by trying to access a memory location that isn't set up in the virtual memory mappings, you get a Segmentation Violation Signal (SIGSEGV), which is normally fatal.
As the system runs out of physical RAM, pages of RAM get removed; if they are read-only copies of something already on disc (like an executable, or a shared object file), they just get de-allocated and are reloaded from their source; if they're read-write (like memory you "created" using malloc), they get written out to the ( page file = swap file = swap partition = on-disc virtual memory ). Accessing these "freed" pages causes another Page Fault, and they're re-loaded.

Generally, though, until your process is bigger than available RAM — and data is almost always significantly larger than the executable — you can safely pretend that you're alone in the world and none of this demand paging stuff is happening.

So: effectively, the kernel already is running your program while it's being loaded (and might never even load some pages, if you never jump into that code / refer to that data).

If your startup is particularly sluggish, you could look at the prelink system to optimize shared library loads. This reduces the amount of work that ld.so has to do at startup (between the exec of your program and main getting called, as well as when you first call library routines).

Sometimes, linking statically can improve performance of a program, but at a major expense of RAM — since your libraries aren't shared, you're duplicating "your libc" in addition to the shared libc that every other program is using, for example. That's generally only useful in embedded systems where your program is running more-or-less alone on the machine.

(*) In point of fact, the kernel is a bit smarter, and will generally preload some pages to reduce the number of page faults, but the theory is the same, regardless of the optimizations

106

answered Oct 11 '22 19:10

BRPocock

Related questions
                            
                                Garbage collection vs. non garbage collection programming languages
                            
                                C++: Should I use strings or char arrays, in general?
                            
                                What are real significant cases when memcpy() is faster than memmove()?
                            
                                Iterate through Lua Table
                            
                                Why calling a function that accepts no parameters with a parameter compiles in C but doesn't in C++
                            
                                Infinite recursion in C
                            
                                How to read a value from an absolute address through C code
                            
                                How can I find the execution time of a section of my program in C?
                            
                                Scripting language for C/C++?
                            
                                A riddle (in C)
                            
                                Converting a UINT32 value into a UINT8 array[4]
                            
                                Difference between macro and preprocessor
                            
                                C & PHP: Storing settings in an integer using bitwise operators?
                            
                                Get the time zone GMT offset in C
                            
                                C Programming: difference between ++i and i=i+1 from an assembler point of view?
                            
                                Mutex in shared memory when one user crashes?
                            
                                Linked list recursive reverse
                            
                                How did 16-bit C compilers work?
                            
                                How to make Gtk+ window background transparent?
                            
                                Writing a library with C and C++ interfaces, which way to wrap?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

When a binary file runs, does it copy its entire binary data into memory at once? Could I change that?

Tags:

c

linux

binary

bxshi

People also ask

1 Answers

BRPocock

Recent Activity

Donate For Us