Here is a little reproducible example:
library(doMC)
library(doParallel)
registerDoMC(4)
timing <- system.time( fitall <- foreach(i=1:1000, .combine = "c") %dopar% {
print(i)
})
I start up R
and look at the process table:
> system("ps -efl")
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
4 S chbr 1 0 5 80 0 - 21399 wait 10:58 ? 00:00:00 /usr/local/lib/R/bin/exec/R --no-save --no-restore
0 S chbr 9 1 0 80 0 - 1113 wait 10:58 ? 00:00:00 sh -c ps -efl
0 R chbr 10 9 0 80 0 - 4294 - 10:58 ? 00:00:00 ps -efl
If I use the aformentioned simple for loop doMC
or doParallel
leave a zombie process behind. Output of ps -efl
after running the loop:
> system("ps -efl")
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
4 S chbr 1 0 4 80 0 - 25256 wait 11:00 ? 00:00:00 /usr/local/lib/R/b
1 Z chbr 10 1 0 80 0 - 0 exit 11:00 ? 00:00:00 [R] <defunct>
0 S chbr 12 1 0 80 0 - 1113 wait 11:00 ? 00:00:00 sh -c ps -efl
0 R chbr 13 12 0 80 0 - 4294 - 11:00 ? 00:00:00 ps -efl
If I repeat the loop without issuing registerDoMC(4)
again no additional zombie process gets created. However, if I issue registerDoMC(4)
an additional zombie process gets created:
> system("ps -efl")
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
4 S chbr 1 0 0 80 0 - 25554 wait 11:00 ? 00:00:01 /usr/local/lib/R/b
1 Z chbr 21 1 0 80 0 - 0 exit 11:02 ? 00:00:00 [R] <defunct>
1 Z chbr 22 1 0 80 0 - 0 exit 11:02 ? 00:00:00 [R] <defunct>
0 S chbr 26 1 0 80 0 - 1113 wait 11:03 ? 00:00:00 sh -c ps -efl
0 R chbr 27 26 0 80 0 - 4294 - 11:03 ? 00:00:00 ps -efl
That's how I figured it could be doMC
which is doing something that should not be done. If doMC is causing this is there a way to stop doMC
from leaving zombie processes behind? (stopCluster()
does not work as no cluster gets created in the first place.)
> sessionInfo()
R Under development (unstable) (2014-08-16 r66404)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_IE.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_IE.UTF-8 LC_COLLATE=en_IE.UTF-8
[5] LC_MONETARY=en_IE.UTF-8 LC_MESSAGES=en_IE.UTF-8
[7] LC_PAPER=en_IE.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_IE.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] doParallel_1.0.8 doMC_1.3.3 iterators_1.0.7 foreach_1.4.2
loaded via a namespace (and not attached):
[1] codetools_0.2-8 compiler_3.2.0
Hence, we need to prevent the creation of zombie processes. 1. Using wait () system call: When the parent process calls wait (), after the creation of a child, it indicates that, it will wait for the child to complete and it will reap the exit status of the child.
The init process regularly performs the necessary cleanup of zombies, so to kill them, you just have to kill the process that created them. The top command is a convenient way to see if you have any zombies. This system has eight zombie processes. We can list these by using the ps command and piping it into egrep.
The zombie processes are listed. This is a neater way to discover the process IDs of zombies than scrolling back and forth through top. We also see that an application called “badprg” spawned these zombies. The process ID of the first zombie is 7641, but we need to find the process ID of its parent process. We can do so by using ps again.
The PCB and the entry in the process table won’t be removed when the child process terminates. This results in the zombie state never being removed from the PCB. Zombies do use a bit of memory, but they don’t usually pose a problem. The entry in the process table is small, but, until it’s released, the process ID can’t be reused.
This really has nothing to do with foreach or doMC; as Steve Weston has pointed out in answer to other StackOverflow queries, doMC is essentially just a wrapper for mclapply, and you can see zombie processes created with a simple call to mclapply:
library(parallel)
mclapply(rep(5,4), rnorm)
On my system, this leaves two zombie processes:
[richcalaway@richcalaway-pc ~]$ ps -efl | grep defunct
1 Z 1660945517 28701 28624 0 77 0 - 0 exit 12:00 pts/1 00:00:00 [R] <defunct>
1 Z 1660945517 28702 28624 0 78 0 - 0 exit 12:00 pts/1 00:00:00 [R] <defunct>
0 S 1660945517 28704 28308 0 78 0 - 15306 pipe_w 12:00 pts/2 00:00:00 grep defunct
Under normal circumstances, these zombie processes won't cause any trouble, and they do disappear when the R session ends. You can avoid them by using doParallel and a fork cluster instead of using doMC.
Cheers,
Rich Calaway
Principal Program Manager
Revolution Analytics
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With