I'm having some difficulty understanding how the OS passes data from the address space of a parent process to the address space of a child process. Namely, in a C program, where is argc and argv stored upon being passed into main?
I understand how argv is essentially a double pointer. What I'm not understanding is what the OS does with those values after loading them into the kernel. After creating an address space for the child process does it push these values on the stack of the new space? We obviously don't want to pass in pointers to another address space.
For the record, I'm working with the MIPS32 architecture.
On Linux, at least on the architectures I've played with, the process starts with %esp
pointing to something like:
argc | argv[0] | argv[1] | ... argv[argc - 1] | argv[argc] == NULL | envp[0] | envp[1] ... envp[?] == NULL
The first function called is traditionally named _start
, and its job is to calculate (argc = %esp, argv = ((char *)%esp) + 1, envp = ((char *)%esp) + argc + 2)
, then call main
with the appropriate calling convention.
On x86, the arguments get passed on the stack.
On amd64, they get passed in registers %rdi
, %rsi
, and %rdx
.
On mips, Google tells me there are several different calling conventions in use - including O32, N32, N64 - but all of them use $a0
, $a1
, $a2
first.
The process is different for different operating systems, and indeed differs depending on how a new process is created. Since I'm more familiar with how modern Microsoft OS's handle this, I'll start there, and make a reference to nix's at the end.
When the [Microsoft] OS creates a process, it allocates a process environment block to hold data specific to that process. This includes, among other things, command line arguments with which the program was invoked. This process environment block is allocated out of the target process's address space, and a pointer to it is provided to the process's entry point. The process environment block for a child process is generally initialized by copying the parent process's environment block into the new process's address space - there's no direct sharing of memory involved.
In the case of a C-based program, the entry point is not the main()
function that the programmer provides. Rather, it is a routine provided by the C runtime library that is responsible for initializing the runtime environment before handing control to the programmer's main()
.
There is a lot of stuff to initialize, but one of the aspects is setting up the argc and argv values. To do this, the runtime will consult the process environment block to find the program name and parameters with which it was invoked. It will then (typically) allocate values for argv out of the process heap (i.e., using something like malloc()
), and assign argc to the number of params found (plus one, for the program name).
The actual values for argc and argv are pushed onto the stack like any other parameters are passed in C, because the call to main()
by the C runtime is just a normal function call.
So, when the code you write inside main()
in the child process accesses argv, it will be reading values out of its own process heap. The source of those values is the process environment block (stored by the OS in the local address space), which was originally initialized by copying the process environment block from the parent process.
On *nix platforms, things are quite a bit different. The primary difference for the present discussion is that nix will store the command-line arguments for the new process directly into the stack space of the process's initial thread. (It also stores environment variables here.) So on *nix, main
is invoked with the argv parameter pointing to values stored in the stack itself.
You can glean some of this in the manpage for execve, while the Linux Programming Interface by Michael Kerrisk has a good description in section 6.4 that you might find excerpted online.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With