Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to convert a nested for loop into multithreading program in perl

I need help with converting a nested for loop into multthreading program in Perl, e.g.

for ( my $i=0; $i<100; $i++) {
    for ( my $j=0; $j<100; $j++ ) {
         for ( my $k=0; $k<100; $k++ ) { 
             #do something ....                 
         } 
     }
 }

Is there a way where i can split the first loop as below and run them in parallel

#Job1: 
for ( my $i=0; $i < 40; $i++) {
    for( my $j=0; $j < 100; $j++) {
        for( my $k=0; $k < 100; $k++) {
            #do something ....
         }
     }
 }

#Job2: 
for ( my $i=40; $i < 80; $i++) {
    for( my $j=0; $j<100; $j++) {
        for( my $k=0; $k<100; $k++) {
            #do something ....
         }
     }
 }

#Job3
for ( my $i=80; $i < 100; $i++) {
    for( my $j=0; $j < 100; $j++) {
        for( my $k=0; $k < 100; $k++) {
            #do something ....
         }
     }
 }

How can I run each program in parallel and then exit the main program only when all the sub program Job1,Job2 and job3 are complete.

like image 491
user3754136 Avatar asked Oct 16 '14 12:10

user3754136


1 Answers

I'll offer a reference to a similar answer I've used before - they key question is - are your jobs completely decoupled? E.g. no data needs to move between them?

If so, use Parallel::ForkManager it goes a bit like this:

use Parallel::ForkManager;
my $fork_manager = Parallel::ForkManager -> new ( 10 ); #10 in parallel

for ( my $i=0;$i<100;$i++) {
    #in parallel:
    $fork_manager -> start and next;
    for ( my $j=0; $j < 100; $j++) {
         for ( my $k=0; $k < 100; $k++) { 
             #do something ....
         }
    }
    $fork_manager -> finish;
}
$fork_manager -> wait_all_children();

This will, for each iteration of $i fork the code and run in parallel - and ForkManager will cap the concurrency at 10.

This number should be approximately comparable to the limiting factor in your parallelism - if it's CPU, then number of CPUs, but bear in mind that you're often more constrained by disk IO.

Key caveats when doing parallelism:

  • You can't guarantee execution sequence without messing around. It's entirely possible that loop $i==1 finishes after loop $i==2. Or before. Or whatever.

  • If you're passing information between your loops, parallel loses efficiency - because the sender and receiver each need to synchronise. It's even worse if you need to synchronise the whole lot, so try to avoid doing that more than necessary. (e.g. wherever possible, leave it until the end and collate the results).

  • That goes double for forked code - they're separate processes, so you actually have to try to transfer things back and forth.

  • You can get some really very fruity bugs from parallel code, because of that first point. Individual lines of code may occur in any order, so very strange things can happen. Each process will sequence, but multiple may well interleave. Something innocuous like open ( my $file, ">>", $output_filename ); can trip you up.

  • forking is quite limited in it's ability to share data between forks. If you need to do much of this, consider threading instead.

Threading is an alternative model of concurrency, that can be valuable in certain circumstance. I'm generally leaning towards forking being generally 'better', but in places where I'm wanting to do a fair bit of inter-process communication, I'd be tending to look more towards threads. Perl daemonize with child daemons

like image 165
Sobrique Avatar answered Sep 23 '22 00:09

Sobrique