Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the use of fork() - ing before exec()?

Tags:

c

unix

process

In *nix systems, processes are created by using fork() system call. Consider for example, init process creates another process.. First it forks itself and creates the a process which has the context like init. Only on calling exec(), this child process turns out to be a new process. So why is the intermediate step ( of creating a child with same context as parent ) needed? Isn't that a waste of time and resource, because we are creating a context ( consumes time and wastes memory ) and then over writing it?

Why is this not implemented as allocating a vacant memory area and then calling exec()? This would save time and resources right?

like image 816
shar Avatar asked Apr 04 '13 17:04

shar


People also ask

Do you have to fork before exec?

If you want the new process to do something other than exactly what the current process is doing, you then replace it (using exec ). You do not have to fork before calling exec .

What is the purpose of using fork ()?

The purpose of fork() is to create a new process, which becomes the child process of the caller. After a new child process is created, both processes will execute the next instruction following the fork() system call. Therefore, we have to distinguish the parent from the child.

Why are fork and exec used together?

So, fork and exec are often used in sequence to get a new program running as a child of a current process. Shells typically do this whenever you try to run a program like find - the shell forks, then the child loads the find program into memory, setting up all command line arguments, standard I/O and so forth.

Why do we need separate fork () and exec () system calls instead of one combined system call that does both?

The main reason is likely that the separation of the fork() and exec() steps allows arbitrary setup of the child environment to be done using other system calls.


2 Answers

The intermediate step enables you to set up shared resources in the child process without the external program being aware of it. The canonical example is constructing a pipe:

// read output of "ls"
// (error checking omitted for brevity)
int pipe_fd[2];
pipe(&pipe_fd);
if (fork() == 0) {       // child:
    close(pipe_fd[0]);   // we don't want to read from the pipe
    dup2(pipe_fd[1], 1); // redirect stdout to the write end of the pipe
    execlp("ls", "ls", (char *) NULL);
    _exit(127);          // in case exec fails
}
// parent:
close(pipe_fd[1]);
fp = fdopen(pipe_fd[0], "r");
while (!feof(fp)) {
    char line[256];
    fgets(line, sizeof line, fp);
    ...
}

Note how the redirection of standard output to the pipe is done in the child, between fork and exec. Of course, for this simple case, there could be a spawning API that would simply do this automatically, given the proper parameters. But the fork() design enables arbitrary manipulation of per-process resources in the child — one can close unwanted file descriptors, modify per-process limits, drop privileges, manipulate signal masks, and so on. Without fork(), the API for spawning processes would end up either extremely fat or not very useful. And indeed, the process spawning calls of competing operating systems typically fall somewhere in between.

As for the waste of memory, it is avoided with the copy on write technique. fork() doesn't allocate new memory for the child process, but points the child to the parent's memory, with the instructions to make a copy of a page only if the page is ever written to. This makes fork() not only memory-efficient, but also fast, because it only needs to copy a "table of contents".

like image 74
user4815162342 Avatar answered Sep 21 '22 18:09

user4815162342


This is an old complaint. Many people have asked Why fork() first? and typically they suggest an operation that will both create a new process from scratch and run a program in it. This operation is called something like spawn().

And they always say, Won't that be faster?

And in fact, every system other than the Unix family does go the "spawn" way. Only Unix is based on fork() and exec().

But it's funny, Unix has always been much faster than other full-featured systems. It has always handled way more users and load.

And Unix has been made even faster over the years. Fork() no longer really duplicates the address space, it just shares it using a technique called copy-on-write. (A very old fork optimization called vfork() is also still around.)

Drink the Kool-Aid.

like image 21
DigitalRoss Avatar answered Sep 23 '22 18:09

DigitalRoss