Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to optimize two-dimensional hash traversing in Perl?

I have a hash of hashes %signal_db. A typical element is: $signal_db{$cycle}{$key}. There are 10,000s of signals, and 10,000s of keys.

Is there any way to optimize (timewise) this piece of code:

foreach my $cycle (sort numerically keys %signal_db) {
    foreach my $key (sort keys %{$signal_db{$cycle}}) {
        print $signal_db{$cycle}{$key}.$key."\n";
    }
}

The elements have to be printed in the same order as in my code.

like image 775
Igor Avatar asked Jan 15 '12 16:01

Igor


1 Answers

Two micro optimizations: map inner hash instead of constant dereferencing and buffer instead of constant print. It's possible to get rid of sorting using alternative storage formats, tested two variants. Results:

               Rate     original         try3  alternative alternative2
original     46.1/s           --         -12%         -21%         -32%
try3         52.6/s          14%           --         -10%         -22%
alternative  58.6/s          27%          11%           --         -13%
alternative2 67.5/s          46%          28%          15%           --

Conclusion:

It's better to use presorted storage format, but without C win would probably be within 100% (on my test dataset). Provided information about data suggests that keys in outer hash are almost sequential numbers, so this cries for array.

Script:

#!/usr/bin/env perl

use strict; use warnings;
use Benchmark qw/timethese cmpthese/;

my %signal_db = map { $_ => {} } 1..1000;
%$_ = map { $_ => $_ } 'a'..'z' foreach values %signal_db;

my @signal_db = map { { cycle => $_ } } 1..1000;
$_->{'samples'} = { map { $_ => $_ } 'a'..'z' } foreach @signal_db;

my @signal_db1 = map { $_ => [] } 1..1000;
@$_ = map { $_ => $_ } 'a'..'z' foreach grep ref $_, @signal_db1;

use Sort::Key qw(nsort);

sub numerically { $a <=> $b }

my $result = cmpthese( -2, {
    'original' => sub {
        open my $out, '>', 'tmp.out';
        foreach my $cycle (sort numerically keys %signal_db) {
            foreach my $key (sort keys %{$signal_db{$cycle}}) {
                print $out $signal_db{$cycle}{$key}.$key."\n";
            }
        }
    },
    'try3' => sub {
        open my $out, '>', 'tmp.out';
        foreach my $cycle (map $signal_db{$_}, sort numerically keys %signal_db) {
            my $tmp = '';
            foreach my $key (sort keys %$cycle) {
                $tmp .= $cycle->{$key}.$key."\n";
            }
            print $out $tmp;
        }
    },
    'alternative' => sub {
        open my $out, '>', 'tmp.out';
        foreach my $cycle (map $_->{'samples'}, @signal_db) {
            my $tmp = '';
            foreach my $key (sort keys %$cycle) {
                $tmp .= $cycle->{$key}.$key."\n";
            }
            print $out $tmp;
        }
    },
    'alternative2' => sub {
        open my $out, '>', 'tmp.out';
        foreach my $cycle (grep ref $_, @signal_db1) {
            my $tmp = '';
            foreach (my $i = 0; $i < @$cycle; $i+=2) {
                $tmp .= $cycle->[$i+1].$cycle->[$i]."\n";
            }
            print $out $tmp;
        }
    },
} );
like image 177
ruz Avatar answered Sep 19 '22 12:09

ruz