I am porting Windows application to Linux. I use CreateProcess
on Windows to run child processes and redirect all standard streams (in, out, error). Streams redirect is critical, main process sends data to children and receives theirs output and error messages. Main process is very big one with a lot of memory and threads, and child processes are small ones. On Linux I see that fork
function has similar functionality as CreateProcess
on Windows. However, manual says that fork
"creates parent process copy", including code, data and stack. Does it mean that if I create copy of a huge process that uses 1 GB of memory just to run a very simple command line tool that uses 1 MB of memory itself, I will need to fist duplicate 1 GB of memory with fork
, and then replace this 1 GB with 1 MB process? So, if I have 100 threads it will be required to have 100 GB of memory to run 100 processes that need just 100 MB of memory to run? Also what about other threads in parent process that "don't know" about fork
execution, what will they do? What fork
function does "under the hood" and is it really effective way to create a lot of small child processes from huge parent?
I haven't used CreateProcess
, but fork()
is not an exact copy of the process. It creates a child process, but the child starts its execution at the same instruction in which the parent called fork
, and continues from there.
I recommend taking a look at Chapter 5 of the Three Easy Pieces OS book. This may get you started and you might find the child spawning call you're looking for.
When you call fork()
then initially only your VM is copied and all pages are marked copy-on write. Your new child process will have a logical copy of your parent processes VM, but it will not consume any additional RAM until you actually start writing to it.
As for threads, fork
creates only one new thread in the child process that resembles a copy of the calling thread.
Also as soon as you call any of the exec
family of calls (which I assume you want to) then your entire process image is replaced with a new one and only file descriptors are kept.
If your parent process has a lot of open file descriptors then I suggest you go through /proc/self/fd
and close all file descriptors in the child that you don't need.
fork
basically splits your process into two, with both parent and child processes continuing at the instruction after the fork
function call. However, the return value value in the child process is 0, whilst in the parent process it is the process id of the child process.
The creation of the child process is extremly quick since it uses the same pages as the parent. The pages are marker as copy-on-write (COW) so that if either process changes the page then the other won't be affected. Once the child process exists it usually calls one of the exec
functions to replace itself with a image. Windows doesn't have an equivilant to fork
, instead the CreateProcess
call only allows you to start a new process.
There is an alternative to fork
called clone which gives you much more control over what happens when the new process is started. For example you can specify a function to call in the new process.
The copies are "copy-on-write", so if your child process does not modify the data, it will not use any memory besides that of the father process. Typically, after a fork()
, the child process makes an exec()
to replace the program of this process with a different one, then all the memory is dropped anyway.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With