Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I sort a Perl hash that has tons of data?

Tags:

sorting

hash

perl

I am sorting a hash in Perl. I encountered an Out of memory error when running my Perl Script:

foreach $key (sort (keys(%hash))) {
   ....
}

How do I sort a hash that has tons of data?

like image 864
syker Avatar asked Jan 23 '26 04:01

syker


1 Answers

sort keys %hash is inefficient for a large %hash in that, memory wise, its roughly equivalent to:

my @keys = keys %hash;
@keys = sort @keys;

In that it has to keep three copies of the keys in memory while doing the sorting (one in the hash, one in the list of keys, one in the sorted list being created). foreach's memory optimizations for iterators do not apply.

Since the hash is so large, the best option is to get it entirely out of memory. Stick it in a BerkeleyDB file. And if you want to keep the keys in order a hash isn't the best option, a tree is. I'd suggest using a Berkeley BTree file. Trees will efficiently keep your data sorted like an array while providing fast lookup like a hash.

Here's an example using BerkeleyDB. DB_File is simpler and better documented but does not take advantage of modern features of BerkeleyDB. YMMV.

use BerkeleyDB;

my $db  = tie my %hash, 'BerkeleyDB::Btree',
              -Filename => "your.db",
              -Compare  => sub { $_[1] cmp $_[0] },
              -Flags    => DB_CREATE;

-Compare illustrates how to supply your own sorting function. The tied interface will be sluggish. Unless you need it to act like a hash, use the object interface.

like image 107
Schwern Avatar answered Jan 24 '26 23:01

Schwern



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!