I am beginner in this area.
I have studied fork()
, vfork()
, clone()
and pthreads.
I have noticed that pthread_create()
will create a thread, which is less overhead than creating a new process with fork()
. Additionally the thread will share file descriptors, memory, etc with parent process.
But when is fork()
and clone()
better than pthreads? Can you please explain it to me by giving real world example?
Thanks in Advance.
If the child will do an identical task to the parent, with identical code, use fork. For smaller subtasks use threads. For separate external processes use neither, just call them with the proper API calls.
On the other hand, OpenMP is much higher level, is more portable and doesn't limit you to using C. It's also much more easily scaled than pthreads. One specific example of this is OpenMP's work-sharing constructs, which let you divide work across multiple threads with relative ease.
A fork() duplicates all the threads of a process. The problem with this is that fork() in a process where threads work with external resources may corrupt those resources (e.g., writing duplicate records to a file) because neither thread may know that the fork() has occurred.
Both the pthreads fork(2) function and the Solaris fork1(2) create a new process, duplicating the complete address space in the child, but duplicating only the calling thread in the child process. This is useful when the child process immediately calls exec() , which is what happens after most calls to fork() .
clone(2) is a Linux specific syscall mostly used to implement threads (in particular, it is used for pthread_create
). With various arguments, clone
can also have a fork(2)-like behavior. Very few people directly use clone
, using the pthread library is more portable. You probably need to directly call clone(2)
syscall only if you are implementing your own thread library - a competitor to Posix-threads - and this is very tricky (in particular because locking may require using futex(2) syscall in machine-tuned assembly-coded routines, see futex(7)). You don't want to directly use clone
or futex
because the pthreads are much simpler to use.
(The other pthread functions require some book-keeping to be done internally in libpthread.so
after a clone
during a pthread_create
)
As Jonathon answered, processes have their own address space and file descriptor set. And a process can execute a new executable program with the execve syscall which basically initialize the address space, the stack and registers for starting a new program (but the file descriptors may be kept, unless using close-on-exec flag, e.g. thru O_CLOEXEC
for open).
On Unix-like systems, all processes (except the very first process, usuallyinit
, of pid 1) are created by fork
(or variants like vfork
; you could, but don't want to, use clone
in such way as it behaves like fork
).
(technically, on Linux, there are some few weird exceptions which you can ignore, notably kernel processes or threads and some rare kernel-initiated starting of processes like /sbin/hotplug
....)
The fork
and execve
syscalls are central to Unix process creation (with waitpid and related syscalls).
A multi-threaded process has several threads (usually created by pthread_create
) all sharing the same address space and file descriptors. You use threads when you want to work in parallel on the same data within the same address space, but then you should care about synchronization and locking. Read a pthread tutorial for more.
I suggest you to read a good Unix programming book like Advanced Unix Programming and/or the (freely available) Advanced Linux Programming
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With