Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl speed comparison on using hash and hash reference

Tags:

hash

perl

I am rying to compare if it is better to use hashes or reference to hashes, hash ref as I understand just a pointer to the hash itself, so I thought should be no speed difference.

I did a basic benchmark and I found the hash refs are slower than using the hash direct by average 20-27%.

Here is the basic benchmark code I used:

use Benchmark qw(:all);

cmpthese(10_000_000, {
    hash            =>  sub { my %hash = (); },
    hashref =>  sub { my $hahref = {}; }
});

cmpthese(10_000_000, {
    hash            =>  sub {
                                    my %hash;
                                    $hash{fname}="first name";
                                    $hash{mname}="middle name";
                                    $hash{lname}="last name";
                                },

    hashref =>  sub {
                                    my $hahref;
                                    $hahref->{fname}="first name"; 
                                    $hahref->{mname}="middle name";
                                    $hahref->{lname}="last name"; 
                                },

    hashrefs    =>  sub {
                                    my $hahref;
                                    $$hahref{fname}="first name"; 
                                    $$hahref{mname}="middle name";
                                    $$hahref{lname}="last name"; 
                                },
});

and here is the benchmark results on laptop dell, windows 8, core i7, 16MB RAM:

             Rate hashref    hash
hashref 5045409/s      --    -17%
hash    6045949/s     20%      --

             Rate hashrefs  hashref     hash
hashrefs 615764/s       --      -2%     -21%
hashref  625978/s       2%       --     -19%
hash     775134/s      26%      24%       --

Output completed (1 min 6 sec consumed)

My question is, if my benchmark is correct and the hash refs are so slow, why most modules like DBI use hash refs to return results. Also most modules accepts hash refs not hashes for arguments and also return hash refs and not hashes.

like image 219
daliaessam Avatar asked Dec 04 '22 06:12

daliaessam


1 Answers

Hashes are faster to access elements from; hashrefs are faster to pass as arguments to a function, or return as the result of a function. This makes sense if you think about it:

  • A hashref is basically a pointer to a hash, so when Perl sees $href->{xyz}, it needs to follow the pointer to find the hash, and then find element xyz in the hash. When Perl sees $hash{xyz} it doesn't need to do that initial pointer-following bit; it can find element xyz straight away.

  • Hashes cannot be directly passed to or from subs; they need to be flattened into a list of scalars. If a hash has, say four keys and four values, then passing it to a sub means passing a list of eight scalars to the function. Inside the function, you'll probably have something like my %args = @_ which copies those eight scalars into a new hash. Lots of work to be done. Passing a hashref is just a matter of passing a single scalar, so it's faster.

Mostly this is micro-optimization, and you should just choose whichever data structure makes the most sense for your program. However for those occasions when you really need to eke out every bit of speed you can, it is possible to have the best of both worlds...

Let's say you have a function which needs to accept a hash (or maybe a hashref; you haven't decided yet) and needs to add up some of the keys. Here are your original two options:

sub add_hash {
    my %hash = @_;
    return $hash{foo} + $hash{bar} + $hash{baz};
}

sub add_hashref {
    my ($href) = @_;                                    # faster than add_hash
    return $href->{foo} + $href->{bar} + $href->{baz};  # slower than add_hash
}

Now let's pull out Data::Alias. This is a module that allows us to create a lexical variable which acts as an alias for another. In particular, we can make a lexical hash variable which acts like an alias for the hash which is pointed to by a hashref.

use Data::Alias;

sub add_hashref_2 {
    my ($href) = @_;                               # faster than add_hash
    alias my %hash = %$href;                       # ... magic happens ...
    return $hash{foo} + $hash{bar} + $hash{baz};   # faster than add_hashref
}

Or better still:

use Data::Alias;

sub add_hashref_3 {
    alias my %hash = %{ $_[0] };
    return $hash{foo} + $hash{bar} + $hash{baz};
}

... which avoids the initial list assignment.

I stress that this is micro-optimization. There are usually far better ways to speed up your code - memoization, radical algorithm changes, rewriting selected hot code paths in XS, etc. But there are some (very limited) occasions when this sort of magic can help.

like image 143
tobyink Avatar answered Dec 27 '22 02:12

tobyink