Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lighter weight alternatives to fork() in POSIX C?

Tags:

c

fork

posix

In the man pages I've been reading, it seems popen, system, etc. tend to call fork(). In turn, fork() copies the process's entire memory state. This seems really heavy, especially when in many situations a child from a call to fork() uses little if any of the memory allocated for the parent.

So, my question is, can I get fork() like behavior without duplicating the whole memory state of the parent process? Or is there something I am missing, such that fork() is not as heavy as it appears (like, maybe calls tend to be optimized to avoid unnecessary memory duplication)?

like image 449
Kyle Avatar asked Oct 20 '15 21:10

Kyle


People also ask

Is Posix a fork?

The return value of fork is recorded in a variable of type pid_t, which is the POSIX type for process identifiers (PIDs).

Can I use fork () on Windows?

Microsoft Windows does not support the fork-exec model, as it does not have a system call analogous to fork() . The spawn() family of functions declared in process. h can replace it in cases where the call to fork() is followed directly by exec() .

Does fork duplicate data?

fork() duplicates the entire process. The only difference is in the return value of the fork() call itself -- in the parent it returns the child's PID, in the child it returns 0 . Most operating systems optimize this, using a technique called copy on write.

Does fork duplicate heap?

This is without exception. What fork() does is the following: It creates a new process which is a copy of the calling process. That means that it copies the caller's memory (code, globals, heap and stack), registers, and open files.


2 Answers

fork(2) is, as all syscalls, a primitive operation (but some C libraries use clone(2) for it), from the point of view of user-space application. It is mostly a single machine instruction SYSCALL or SYSENTER to switch from user-mode to kernel-mode, then the (recent version of) Linux kernel is doing quite significant processing.

It is in practice quite efficient (e.g. less than a millisecond, and sometimes even less than a tenth of it) because the kernel is extensively using lazy copy-on-write techniques to share pages between parent & child processes. The actual copying would happen later, on page faults, when overwriting a shared page.

And forking has a huge advantage, since the starting of some other program is delegated to execve(2): it is conceptually simple: the only difference between the parent & child processes is the result of fork

BTW on POSIX systems such as Linux, fork(2) or the suitable clone(2) equivalent is the only way to create a process (there are some few weird exceptions that you should generally ignore: the kernel is making some processes like /sbin/init etc...), since vfork(2) is obsolete.

like image 60
Basile Starynkevitch Avatar answered Nov 07 '22 07:11

Basile Starynkevitch


The problem is that to run the main function of a standardly linked executable, you need to call execve, and exec replaces the whole process image and so you need a new address space, which is what fork is for.

You can get around this by having your calee expose its main functionality in a shared library (but then it must not be called main), and then you can load the function with the main functionality without having to fork (provided there are no symbol conflicts).

That would be a more efficient alternative to system (basically with the efficiency of a function call). Now popen involves pipes and to use pipes you need to have the pipe ends in different schedulable units. Threads, which use the same address space, can be used here as a lighter alternative to separate processes.

like image 39
PSkocik Avatar answered Nov 07 '22 08:11

PSkocik