In the man pages I've been reading, it seems popen, system, etc. tend to call fork(). In turn, fork() copies the process's entire memory state. This seems really heavy, especially when in many situations a child from a call to fork() uses little if any of the memory allocated for the parent.
So, my question is, can I get fork() like behavior without duplicating the whole memory state of the parent process? Or is there something I am missing, such that fork() is not as heavy as it appears (like, maybe calls tend to be optimized to avoid unnecessary memory duplication)?
The return value of fork is recorded in a variable of type pid_t, which is the POSIX type for process identifiers (PIDs).
Microsoft Windows does not support the fork-exec model, as it does not have a system call analogous to fork() . The spawn() family of functions declared in process. h can replace it in cases where the call to fork() is followed directly by exec() .
fork() duplicates the entire process. The only difference is in the return value of the fork() call itself -- in the parent it returns the child's PID, in the child it returns 0 . Most operating systems optimize this, using a technique called copy on write.
This is without exception. What fork() does is the following: It creates a new process which is a copy of the calling process. That means that it copies the caller's memory (code, globals, heap and stack), registers, and open files.
fork(2) is, as all syscalls, a primitive operation (but some C libraries use clone(2) for it), from the point of view of user-space application. It is mostly a single machine instruction SYSCALL
or SYSENTER
to switch from user-mode to kernel-mode, then the (recent version of) Linux kernel is doing quite significant processing.
It is in practice quite efficient (e.g. less than a millisecond, and sometimes even less than a tenth of it) because the kernel is extensively using lazy copy-on-write techniques to share pages between parent & child processes. The actual copying would happen later, on page faults, when overwriting a shared page.
And forking has a huge advantage, since the starting of some other program is delegated to execve(2): it is conceptually simple: the only difference between the parent & child processes is the result of fork
BTW on POSIX systems such as Linux, fork(2) or the suitable clone(2) equivalent is the only way to create a process (there are some few weird exceptions that you should generally ignore: the kernel is making some processes like /sbin/init
etc...), since vfork(2) is obsolete.
The problem is that to run the main function of a standardly linked executable, you need to call execve
, and exec replaces the whole process image and so you need a new address space, which is what fork
is for.
You can get around this by having your calee expose its main
functionality in a shared library (but then it must not be called main), and then you can load the function with the main
functionality without having to fork (provided there are no symbol conflicts).
That would be a more efficient alternative to system
(basically with the efficiency of a function call).
Now popen
involves pipes and to use pipes you need to have the pipe ends in different schedulable units. Threads, which use the same address space, can be used here as a lighter alternative to separate processes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With