Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Shared readonly memory between Perl processes

I wish to make a Perl program of mine use multiple cores. It progressively reads query input and compares chunks of that against a read-only data-structure loaded from file into memory for each run. That data-structure, which is typically a few giga-bytes, is a small set of packed strings that are used in small C-routines. When processes are forked, everything is copied, which on a multi-core machine quickly blows the RAM. I tried several non-standard modules, but all leads to slowness and/or blows the RAM. I thought, for read-only data, that Perl would not insist on making copies. Other languages can do it. Does anyone have ideas?

like image 892
Niels Larsen Avatar asked Oct 03 '12 14:10

Niels Larsen


1 Answers

Fork doesn't normally copy memory until it's modified (search for copy on write or COW). Are you sure you are measuring memory usage correctly? Subtract before/after values from free rather than using top.

EDIT - example script

Try running the following with settings like: ./fork_mem_usage 5 10000 ./fork_mem_usage 25 10000 ./fork_mem_usage 5 100000 ./fork_mem_usage 25 100000

If the first increase is bigger than the subsequent ones then fork is using copy-on-write. It almost certainly is (except for Windows of course).

#!/usr/bin/perl
use strict;
use warnings;

my $num_kids  = shift @ARGV;
my $arr_size  = shift @ARGV;
print "$num_kids x $arr_size\n";

my @big_array = ('abcdefg') x $arr_size;
die "Array wrong length" unless ($arr_size == @big_array);

print_mem_usage('Start');

for my $i (1..$num_kids) {
    my $pid = fork();
    if ($pid) {
        if ($i % 5 == 0) {
            print_mem_usage($i);
        }
    }
    else {
        sleep(5);
        exit;
    }
}

print_mem_usage('End');
exit;

sub print_mem_usage {
    my $msg = shift;
    print "$msg: ";
    system q(free -m | grep buffers/cache | awk '{print $3}');
}
like image 90
Richard Huxton Avatar answered Oct 20 '22 13:10

Richard Huxton