What's the fastest, best way on modern Linux of achieving the same effect as a fork
-execve
combo from a large process ?
My problem is that the process forking is ~500MByte big, and a simple benchmarking test achieves only about 50 forks/s from the process (c.f ~1600 forks/s from a minimally sized process) which is too slow for the intended application.
Some googling turns up vfork
as having being invented as the solution to this problem... but also warnings about not to use it. Modern Linux seems to have acquired related clone
and posix_spawn
calls; are these likely to help ? What's the modern replacement for vfork
?
I'm using 64bit Debian Lenny on an i7 (the project could move to Squeeze if posix_spawn
would help).
The fork( ) system call creates an exact duplicate of the address space from which it is called, resulting in two address spaces executing the same code. Problems can occur if the forking address space has multiple threads executing at the time of the fork( ).
Threading runs multiple lines of execution intra-process. Forking is a means of creating new processes.
fork() creates a new process by duplicating the calling process. The new process is referred to as the child process. The calling process is referred to as the parent process. The child process and the parent process run in separate memory spaces.
Threads are functions run in parallel, fork is a new process with parents inheritance. Threads are good to execute a task in parallel, while forks are independent process, that also are running simultaneously.
On Linux, you can use posix_spawn(2)
with the POSIX_SPAWN_USEVFORK
flag to avoid the overhead of copying page tables when forking from a large process.
See Minimizing Memory Usage for Creating Application Subprocesses for a good summary of posix_spawn(2)
, its advantages and some examples.
To take advantage of vfork(2)
, make sure you #define _GNU_SOURCE
before #include <spawn.h>
and then simply posix_spawnattr_setflags(&attr, POSIX_SPAWN_USEVFORK)
I can confirm that this works on Debian Lenny, and provides a massive speed-up when forking from a large process.
benchmarking the various spawns over 1000 runs at 100M RSS user system total real fspawn (fork/exec): 0.100000 15.460000 40.570000 ( 41.366389) pspawn (posix_spawn): 0.010000 0.010000 0.540000 ( 0.970577)
Outcome: I was going to go down the early-spawned helper subprocess route as suggested by other answers here, but then I came across this re using huge page support to improve fork performance.
Having tried it myself using libhugetlbfs to simply make all my app's mallocs allocate huge pages, I'm now getting around 2400 forks/s regardless of the process size (over the range I'm interested in anyway). Amazing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With