I have newly discovered Perl forking and I am quite in love. But one thing concerns me -- if I am just splitting off processes left-and-right, surely this will cause some issue somewhere. Is there a reasonable kind of check that one should use to make that sure that my little application doesn't eat up all of the resources of my machine?
Take this sample code:
foreach my $command (@commands) {
my $pid = fork();
if (!$defined $pid) {
#Fork failed. Do something.
} elsif ($pid == 0) { #This is the child.
system($command);
exit(0)
}
}
while (wait() != -1) {} #wait() will be -1 when the last child exits.
So this will work fine, and spawn off a process to handle each command. It will all happen in parallel which is great if these commands are completely independent.
What if I suddenly have 5,000+ commands to run through? It wouldn't be wise to thoughtlessly fork off that many processes. So what kind of check should be implemented, and how?
)
Also if you are worried about spawning too many forked processes at the same time, you can throttle them.
Either roll your own (using a queue to hold "to fork" ones), or better yet, use Parallel::ForkManager
module which allows you to limit simultaneous forks via a constructor parameter.
use Parallel::ForkManager;
$pm = new Parallel::ForkManager($MAX_PROCESSES);
Please note that ForkManager will ALSO take care of reaping ended child processes for you via a provided "wait*" APIs
When a child exits it's going to send back a SIG_CHLD
to the parent. You'll want to reap these as if you don't they will be zombies in the process table until that final call to wait
at the end of your script.
Chapter 16 of O'Reilly's Perl Cookbook on Google books provides a bunch of information on this. Basically, you need to increment a counter when you're forking children, and decrement it when you're reaping them, and not fork new ones past a reasonable max of currently running children.
As to what a "reasonable max" is ... depends on the hardware, and what those forked processes are doing. There's no static answer to that question other than to say test what you're doing and look at the performance impact on the machine. Preferably during business hours. After telling the sysadmin what you're doing. He/she may even have some advice.
To make sure you dont spawn more processes than what system can efficiently handle you can use modules like Parallel::ForkManager
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With