Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is the unix fork exec sequence really as expensive as it sounds?

I'm reading about fork and exec for an exam, and my book says that whenever it is needed to run a new (different) process in unix systems, you would fork the current process followed by an execve.

However, it also says that whenever fork is called, the whole memory image of the parent is copied to the new process.

Then my question is: What if you have a process with a really big memory image, and you just want to run a new process? Isn't it a waste of resources to copy all the data from the parent process if you are just going to replace it immediately?

like image 300
bobbaluba Avatar asked Dec 03 '11 13:12

bobbaluba


People also ask

How is fork () and exec () calls different from each other?

So the main difference between fork() and exec() is that fork starts new process which is a copy of the main process. the exec() replaces the current process image with new one, Both parent and child processes are executed simultaneously.

Why do we need separate fork () and exec () system calls instead of one combined system call that does both?

The main reason is likely that the separation of the fork() and exec() steps allows arbitrary setup of the child environment to be done using other system calls.

What is the difference between fork () and exec () on Unix?

Main differences between the fork() and exec() In a UNIX operating system, the fork is a command that allows a process to copy itself. However, in a UNIX operating system, exec is a command that creates a new process by replacing the existing one. The fork() makes a child's process equal to the parent's process.

What is the benefit of fork system call?

It takes no arguments and returns a process ID. The purpose of fork() is to create a new process, which becomes the child process of the caller. After a new child process is created, both processes will execute the next instruction following the fork() system call.


2 Answers

Usually the fork does not actually copy all the memory, but uses a "copy on write" which means that as long as the memory is not modified the same pages are used. However, to avoid not having enough memory later on (should the process write to the memory) enough memory must be allocated.

This means that forking from large process on systems that do not allow overcommitting memory the memory must be available. So, if you have a 8 GB process forking, then for at least a short period of time 16 GB must be available.

See also vfork and posix_spawn for other solutions.

like image 58
Roger Lindsjö Avatar answered Sep 22 '22 11:09

Roger Lindsjö


Some systems that are either very old (early unix), or very special (mmu-less linux) or very crappy (windows via cygwin) do need to make a full copy of all pages ("every byte") on fork, so the potential is there.

Modern unix kernels do not copy all the process memory, instead chosing to make a virtual copy. While this involves only a fraction of the copying (page tables need to be copied), this can still be many megabytes and take substantial time.

So the answer is, yes in general, but most modern implementations use hardware to make a fast virtual copy, but even that virtual copy isn't free.

Both old and some modern systems implement a special vfork() call which has somewhat strict limitations (although less strict than the POSIX requireemnts for vfork) but avoid this copy, for performance reasons.

To give some actual numbers, on my GNU/Linux system, I can fork+exit 1340 times per second from a 20MB process, but only 235 times/s on a 2000MB process. In both cases it is faster to vfork+execve, which is somewhat unintuitive, because many people think "fork is fast" and "execve must be slow".

like image 30
Remember Monica Avatar answered Sep 22 '22 11:09

Remember Monica