Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl - responsible forking

Tags:

fork

perl

I have newly discovered Perl forking and I am quite in love. But one thing concerns me -- if I am just splitting off processes left-and-right, surely this will cause some issue somewhere. Is there a reasonable kind of check that one should use to make that sure that my little application doesn't eat up all of the resources of my machine?

Take this sample code:

foreach my $command (@commands) {
   my $pid = fork();
   if (!$defined $pid) {
     #Fork failed. Do something.
   } elsif ($pid == 0) { #This is the child.
      system($command);
      exit(0)
   }
}

while (wait() != -1) {} #wait() will be -1 when the last child exits.

So this will work fine, and spawn off a process to handle each command. It will all happen in parallel which is great if these commands are completely independent.

What if I suddenly have 5,000+ commands to run through? It wouldn't be wise to thoughtlessly fork off that many processes. So what kind of check should be implemented, and how?

)

like image 507
Jeremy Avatar asked Feb 09 '12 07:02

Jeremy


3 Answers

Also if you are worried about spawning too many forked processes at the same time, you can throttle them.

Either roll your own (using a queue to hold "to fork" ones), or better yet, use Parallel::ForkManager module which allows you to limit simultaneous forks via a constructor parameter.

use Parallel::ForkManager;
$pm = new Parallel::ForkManager($MAX_PROCESSES);

Please note that ForkManager will ALSO take care of reaping ended child processes for you via a provided "wait*" APIs

like image 157
DVK Avatar answered Nov 07 '22 12:11

DVK


When a child exits it's going to send back a SIG_CHLD to the parent. You'll want to reap these as if you don't they will be zombies in the process table until that final call to wait at the end of your script.

Chapter 16 of O'Reilly's Perl Cookbook on Google books provides a bunch of information on this. Basically, you need to increment a counter when you're forking children, and decrement it when you're reaping them, and not fork new ones past a reasonable max of currently running children.

As to what a "reasonable max" is ... depends on the hardware, and what those forked processes are doing. There's no static answer to that question other than to say test what you're doing and look at the performance impact on the machine. Preferably during business hours. After telling the sysadmin what you're doing. He/she may even have some advice.

like image 22
Brian Roach Avatar answered Nov 07 '22 12:11

Brian Roach


To make sure you dont spawn more processes than what system can efficiently handle you can use modules like Parallel::ForkManager

like image 1
Damodharan R Avatar answered Nov 07 '22 12:11

Damodharan R