Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Garbage collection in Perl threads

This question is a point of curiosity, as one of the two programs below works.

I'm using Image::Magick to resize a number of photos. To save a bit of time, I work on each photo in its own thread, and use a semaphore to limit the number of threads working simultaneously. Originally I allowed every thread to run at once, but the script would quickly allocate 3.5 GB for all the photos (I only have 2GB available), and the script would run 5x slower than normal because of all the swapping to disk.

The working, semaphore version code looks something like this:

use threads;
use Thread::Semaphore;
use Image::Magick;

my $s = Thread::Semaphore->new(4);
foreach ( @photos ) {
    threads->create( \&launch_thread, $s );
}
foreach my $thr ( reverse threads->list() ) {
    $thr->join();
}

sub launch_thread {
    my $s = shift;
    $s->down();
    my $image = Image::Magick->new();

    # do memory-heavy work here

    $s->up();
}

This quickly allocates 500MB, and runs quite nicely without ever requiring more. (The threads are joined in reverse order to make a point.)

I wondered if there might be overhead from launching 80 threads simultaneously and blocking most of them, so I altered my script to block the main thread:

my $s = Thread::Semaphore->new(4);
foreach ( @photos ) {
    $s->down();
    threads->create( \&launch_thread, $s );
}
foreach my $thr ( threads->list() ) {
    $thr->join();
}

sub launch_thread {
    my $s = shift;
    my $image = Image::Magick->new();

    # do memory-heavy work here

    $s->up();
}

This version starts fine, but gradually accumulates the 3.5GB of space the original version used. It's faster than running all threads at once, but still quite a bit slower than blocking threads.

My first guess was that the memory used by a thread isn't freed until join() is called on it, and as it's the main thread that blocks, no threads are freed until they've all been allocated. However, in the first, working version, the threads pass the guard in a more-or-less random order, but join in reverse order. If my guess is correct, then, many more than the four running threads should be waiting to be join()ed at any time, and this version should be slower as well.

So why are these two versions so different?

like image 563
pconley Avatar asked Oct 05 '12 19:10

pconley


1 Answers

You don't need to create more than 4 threads. One major benefit is that this means 76 fewer copies of the Perl interpreter. Also, it makes the reaping order rather moot since all the threads finish at more or less the same time.

use threads;
use Thread::Queue qw( );
use Image::Magick qw( );

use constant NUM_WORKERS => 4;

sub process {
   my ($photo) = @_;
   ...
}

{
   my $request_q = Thread::Queue->new();

   my @threads;
   for (1..NUM_WORKERS) {
       push @threads, async {
          while (my $photo = $request_q->dequeue()) {
             process($photo);
          }
       };
   }

   $request_q->enqueue($_) for @photos;
   $request_q->enqueue(undef) for 1..NUM_THREADS;
   $_->join() for @threads;
}
like image 152
ikegami Avatar answered Sep 28 '22 10:09

ikegami