Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use fork() in unix? Why not something of the form fork(pointerToFunctionToRun)?

I am having some trouble understanding how to use Unix's fork(). I am used to, when in need of parallelization, spawining threads in my application. It's always something of the form

CreateNewThread(MyFunctionToRun());

void myFunctionToRun() { ... }

Now, when learning about Unix's fork(), I was given examples of the form:

fork();
printf("%d\n", 123);

in which the code after the fork is "split up". I can't understand how fork() can be useful. Why doesn't fork() have a similar syntax to the above CreateNewThread(), where you pass it the address of a function you want to run?

To accomplish something similar to CreateNewThread(), I'd have to be creative and do something like

//pseudo code
id = fork();

if (id == 0) { //im the child
    FunctionToRun();
} else { //im the parent
    wait();
}

Maybe the problem is that I am so used to spawning threads the .NET way that I can't think clearly about this. What am I missing here? What are the advantages of fork() over CreateNewThread()?

PS: I know fork() will spawn a new process, while CreateNewThread() will spawn a new thread.

Thanks

like image 313
devoured elysium Avatar asked Nov 12 '10 01:11

devoured elysium


People also ask

Why do we use fork in Unix?

In the computing field, fork() is the primary method of process creation on Unix-like operating systems. This function creates a new copy called the child out of the original process, that is called the parent. When the parent process closes or crashes for some reason, it also kills the child process.

What happens when fork () is called?

When a process calls fork, it is deemed the parent process and the newly created process is its child. After the fork, both processes not only run the same program, but they resume execution as though both had called the system call.

What is fork command Unix?

System call fork() is used to create processes. It takes no arguments and returns a process ID. The purpose of fork() is to create a new process, which becomes the child process of the caller. After a new child process is created, both processes will execute the next instruction following the fork() system call.

How a new process is created in Linux when fork () is called?

A new process can be created by the fork() system call. The new process consists of a copy of the address space of the original process. fork() creates new process from existing process. Existing process is called the parent process and the process is created newly is called child process.


1 Answers

fork() says "copy the current process state into a new process and start it running from right here." Because the code is then running in two processes, it in fact returns twice: once in the parent process (where it returns the child process's process identifier) and once in the child (where it returns zero).

There are a lot of restrictions on what it is safe to call in the child process after fork() (see below). The expectation is that the fork() call was part one of spawning a new process running a new executable with its own state. Part two of this process is a call to execve() or one of its variants, which specifies the path to an executable to be loaded into the currently running process, the arguments to be provided to that process, and the environment variables to surround that process. (There is nothing to stop you from re-executing the currently running executable and providing a flag that will make it pick up where the parent left off, if that's what you really want.)

The UNIX fork()-exec() dance is roughly the equivalent of the Windows CreateProcess(). A newer function is even more like it: posix_spawn().

As a practical example of using fork(), consider a shell, such as bash. fork() is used all the time by a command shell. When you tell the shell to run a program (such as echo "hello world"), it forks itself and then execs that program. A pipeline is a collection of forked processes with stdout and stdin rigged up appropriately by the parent in between fork() and exec().

If you want to create a new thread, you should use the Posix threads library. You create a new Posix thread (pthread) using pthread_create(). Your CreateNewThread() example would look like this:

#include <pthread.h>

/* Pthread functions are expected to accept and return void *. */ 
void *MyFunctionToRun(void *dummy __unused);

pthread_t thread;
int error = pthread_create(&thread,
        NULL/*use default thread attributes*/,
        MyFunctionToRun,
        (void *)NULL/*argument*/);

Before threads were available, fork() was the closest thing UNIX provided to multithreading. Now that threads are available, usage of fork() is almost entirely limited to spawning a new process to execute a different executable.

below: The restrictions are because fork() predates multithreading, so only the thread that calls fork() continues to execute in the child process. Per POSIX:

A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources. Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called. [THR] [Option Start] Fork handlers may be established by means of the pthread_atfork() function in order to maintain application invariants across fork() calls. [Option End]

When the application calls fork() from a signal handler and any of the fork handlers registered by pthread_atfork() calls a function that is not asynch-signal-safe, the behavior is undefined.

Because any library function you call could have spawned a thread on your behalf, the paranoid assumption is that you are always limited to executing async-signal-safe operations in the child process between calling fork() and exec().

like image 147
Jeremy W. Sherman Avatar answered Sep 23 '22 05:09

Jeremy W. Sherman