Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When a binary file runs, does it copy its entire binary data into memory at once? Could I change that?

Tags:

c

linux

binary

Does it copy the entire binary to the memory before it executes? I am interested in this question and want to change it into some other way. I mean, if the binary is 100M big (seems impossible), I could run it while I am copying it into the memory. Could that be possible?

Or could you tell me how to see the way it runs? Which tools do I need?

like image 939
bxshi Avatar asked Dec 14 '11 15:12

bxshi


People also ask

Is it necessary to copy all sections into memory when a binary is loaded for execution?

No, it only loads the necessary pages into memory.

How is data stored in a binary file?

Binary files can be used to store any data; for example, a JPEG image is a binary file designed to be read by a computer system. The data inside a binary file is stored as raw bytes, which is not human readable.

How do binary files work?

A binary file is a file whose content is in a binary format consisting of a series of sequential bytes, each of which is eight bits in length. The content must be interpreted by a program or a hardware processor that understands in advance exactly how that content is formatted and how to read the data.

What is binary data transfer?

Binary file transfer is a standard used to transmit data files through protocols of different telematic services, including Telefax Group 3 and 4, Teletex and Data Transfer and Manipulation (DTAM) normal mode.


1 Answers

The theoretical model for an application-level programmer makes it appear that this is so. In point of fact, the normal startup process (at least in Linux 1.x, I believe 2.x and 3.x are optimized but similar) is:

  • The kernel creates a process context (more-or-less, virtual machine)
  • Into that process context, it defines a virtual memory mapping that maps from RAM addresses to the start of your executable file
  • Assuming that you're dynamically linked (the default/usual), the ld.so program (e.g. /lib/ld-linux.so.2) defined in your program's headers sets up memory mapping for shared libraries
  • The kernel does a jmp into the startup routine of your program (for a C program, that's something like crtprec80, which calls main). Since it has only set up the mapping, and not actually loaded any pages(*), this causes a Page Fault from the CPU's Memory Management Unit, which is an interrupt (exception, signal) to the kernel.
  • The kernel's Page Fault handler loads some section of your program, including the part that caused the page fault, into RAM.
  • As your program runs, if it accesses a virtual address that doesn't have RAM backing it up right now, Page Faults will occur and cause the kernel to suspend the program briefly, load the page from disc, and then return control to the program. This all happens "between instructions" and is normally undetectable.
  • As you use malloc/new, the kernel creates read-write pages of RAM (without disc backing files) and adds them to your virtual address space.
  • If you throw a Page Fault by trying to access a memory location that isn't set up in the virtual memory mappings, you get a Segmentation Violation Signal (SIGSEGV), which is normally fatal.
  • As the system runs out of physical RAM, pages of RAM get removed; if they are read-only copies of something already on disc (like an executable, or a shared object file), they just get de-allocated and are reloaded from their source; if they're read-write (like memory you "created" using malloc), they get written out to the ( page file = swap file = swap partition = on-disc virtual memory ). Accessing these "freed" pages causes another Page Fault, and they're re-loaded.

Generally, though, until your process is bigger than available RAM — and data is almost always significantly larger than the executable — you can safely pretend that you're alone in the world and none of this demand paging stuff is happening.

So: effectively, the kernel already is running your program while it's being loaded (and might never even load some pages, if you never jump into that code / refer to that data).

If your startup is particularly sluggish, you could look at the prelink system to optimize shared library loads. This reduces the amount of work that ld.so has to do at startup (between the exec of your program and main getting called, as well as when you first call library routines).

Sometimes, linking statically can improve performance of a program, but at a major expense of RAM — since your libraries aren't shared, you're duplicating "your libc" in addition to the shared libc that every other program is using, for example. That's generally only useful in embedded systems where your program is running more-or-less alone on the machine.

(*) In point of fact, the kernel is a bit smarter, and will generally preload some pages to reduce the number of page faults, but the theory is the same, regardless of the optimizations

like image 106
BRPocock Avatar answered Oct 11 '22 19:10

BRPocock