I have a system in which two identical processes are run (let's call them replicas). When signaled, a replica will duplicate itself by using the fork()
call. A third process selects one of the processes to kill randomly, and then signals the other to create a replacement. Functionally, the system works well; it can kill / respawn replicas all day except for the performance issue.
The fork()
call is taking longer and longer. The following is the simplest setup that still displays the problem. The timing be is displayed in the graph below:
The replica's code is the following:
void restartHandler(int signo) {
// fork
timestamp_t last = generate_timestamp();
pid_t currentPID = fork();
if (currentPID >= 0) { // Successful fork
if (currentPID == 0) { // Child process
timestamp_t current = generate_timestamp();
printf("%lld\n", current - last);
// unblock the signal
sigset_t signal_set;
sigemptyset(&signal_set);
sigaddset(&signal_set, SIGUSR1);
sigprocmask(SIG_UNBLOCK, &signal_set, NULL);
return;
} else { // Parent just returns
waitpid(-1, NULL, WNOHANG);
return;
}
} else {
printf("Fork error!\n");
return;
}
}
int main(int argc, const char **argv) {
if (signal(SIGUSR1, restartHandler) == SIG_ERR) {
perror("Failed to register the restart handler");
return -1;
}
while(1) {
sleep(1);
}
return 0;
}
The longer the system runs, the worse it gets.
Sorry to lack a specific question, but does anyone have any idea / clues as to what is going on? It seems to me that there is a resource leak in the kernel (thus the linux-kernel tag), but I don't know where where to start looking.
What I have tried:
/proc/<pid>/maps
is not growing.Any hints? Anything I can provide to help? Thanks!
Fork is a function in Unix that is used to generate a duplicate of particular process by creating two simultaneous executing processes of a program. These two processes are typically called the "parent" and "child" processes. They use multitasking protocols to share system resources.
When a process calls fork, it is deemed the parent process and the newly created process is its child. After the fork, both processes not only run the same program, but they resume execution as though both had called the system call.
fork() creates a new process by duplicating the calling process. The new process is referred to as the child process. The calling process is referred to as the parent process. The child process and the parent process run in separate memory spaces.
Each invocation of fork() results in two processes, the child and the parent. Thus the first fork results in two processes.
The slowdown is caused by an accumulation of anonymous vmas, and is a known problem. The problem is evident when there are a large number of fork()
calls and the parent exits before the children. The following code recreates the problem (source Daniel Forrest):
#include <unistd.h>
int main(int argc, char *argv[])
{
pid_t pid;
while (1) {
pid = fork();
if (pid == -1) {
/* error */
return 1;
}
if (pid) {
/* parent */
sleep(2);
break;
}
else {
/* child */
sleep(1);
}
}
return 0;
}
The behavior can be confirmed by checking anon_vma
in /proc/slabinfo
.
There is a patch (source) which limits the length of copied anon_vma_chain
to five. I can confirm that the patch fixes the problem.
As for how I eventually found the problem, I finally just started putting printk
calls throughout the fork
code, checking the times shown in dmesg
. Eventually I saw that it was the call to anon_vma_fork
which was taking longer and longer. Then it was a quick matter of google searching.
It took a rather long time, so I would still appreciate any suggestions for a better way to have gone about tracking down the problem. And to all of those that already spent time trying to assist me, Thank You.
Maybe you could try using the generic wait() call, rather than waitpid()? It's just a guess, but I heard it was better from a professor in undergrad. Also, have you tried using address sanitizer
Also, you can use GDB to debug a child process as well (if you haven't already tried that). You can use follow-fork-mode:
set follow-fork-mode child
but that is only capable of debugging the parent. You can debug both by getting the pid of the child process, calling sleep() after forking then:
attach <child process pid>
then call:
detach
This is useful because you can dump memory leaks into valgrind. Just call valgrind with
valgrind --vgdb-error=0...<executable>
then set some relevant breakpoints, and continue through your program until you hit your breakpoints then search for leaks:
monitor leak_check full reachable any
then:
monitor block_list <loss_record_nr>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With