Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Linux fork function compared to Windows' CreateProcess - what gets copied?

Tags:

c++

c

linux

fork

I am porting Windows application to Linux. I use CreateProcess on Windows to run child processes and redirect all standard streams (in, out, error). Streams redirect is critical, main process sends data to children and receives theirs output and error messages. Main process is very big one with a lot of memory and threads, and child processes are small ones. On Linux I see that fork function has similar functionality as CreateProcess on Windows. However, manual says that fork "creates parent process copy", including code, data and stack. Does it mean that if I create copy of a huge process that uses 1 GB of memory just to run a very simple command line tool that uses 1 MB of memory itself, I will need to fist duplicate 1 GB of memory with fork, and then replace this 1 GB with 1 MB process? So, if I have 100 threads it will be required to have 100 GB of memory to run 100 processes that need just 100 MB of memory to run? Also what about other threads in parent process that "don't know" about fork execution, what will they do? What fork function does "under the hood" and is it really effective way to create a lot of small child processes from huge parent?

like image 536
Vitalii Avatar asked Feb 12 '14 15:02

Vitalii


4 Answers

I haven't used CreateProcess, but fork() is not an exact copy of the process. It creates a child process, but the child starts its execution at the same instruction in which the parent called fork, and continues from there.

I recommend taking a look at Chapter 5 of the Three Easy Pieces OS book. This may get you started and you might find the child spawning call you're looking for.

like image 129
ArthurChamz Avatar answered Oct 10 '22 09:10

ArthurChamz


When you call fork() then initially only your VM is copied and all pages are marked copy-on write. Your new child process will have a logical copy of your parent processes VM, but it will not consume any additional RAM until you actually start writing to it.

As for threads, fork creates only one new thread in the child process that resembles a copy of the calling thread.

Also as soon as you call any of the exec family of calls (which I assume you want to) then your entire process image is replaced with a new one and only file descriptors are kept.

If your parent process has a lot of open file descriptors then I suggest you go through /proc/self/fd and close all file descriptors in the child that you don't need.

like image 30
Sergey L. Avatar answered Oct 10 '22 09:10

Sergey L.


fork basically splits your process into two, with both parent and child processes continuing at the instruction after the fork function call. However, the return value value in the child process is 0, whilst in the parent process it is the process id of the child process.

The creation of the child process is extremly quick since it uses the same pages as the parent. The pages are marker as copy-on-write (COW) so that if either process changes the page then the other won't be affected. Once the child process exists it usually calls one of the exec functions to replace itself with a image. Windows doesn't have an equivilant to fork, instead the CreateProcess call only allows you to start a new process.

There is an alternative to fork called clone which gives you much more control over what happens when the new process is started. For example you can specify a function to call in the new process.

like image 3
Sean Avatar answered Oct 10 '22 11:10

Sean


The copies are "copy-on-write", so if your child process does not modify the data, it will not use any memory besides that of the father process. Typically, after a fork(), the child process makes an exec() to replace the program of this process with a different one, then all the memory is dropped anyway.

like image 2
Alfe Avatar answered Oct 10 '22 10:10

Alfe