I have about 100 million rows such as:
A : value of A
B : value of B
|
|
|
Z : value of Z upto 100 million unique entries
Currently each time I run my program I load the entire file as a hash which takes some time. During the run time I need access to value of A,B given I know A,B etc.
I am wondering if I can make a hash once and store it as a binary data structure or index the file. What would be possible in in perl with least programming.
Thanks! -Abhi
I suggest an on-disk key/value database. Due to Perl's tie function, they can be used identically to normal, in-memory hashes. They'll be faster than Perl's hashes for reading/writing if your hash is very large, and they support saving/loading to disk automatically.
BerkeleyDB is an old favourite:
use BerkeleyDB;
# Make %db an on-disk database stored in database.dbm. Create file if needed
tie my %db, 'BerkeleyDB::Hash', -Filename => "database.dbm", -Flags => DB_CREATE
or die "Couldn't tie database: $BerkeleyDB::Error";
$db{foo} = 1; # get value
print $db{foo}, "\n"; # set value
for my $key (keys %db) {
print "$key -> $db{$key}\n"; # iterate values
}
%db = (); # wipe
Changes to the database are automatically saved to disk and will persist through multiple invocations of your script.
Check the perldoc for options, but the most important are:
# Increase memory allocation for database (increases performance), e.g. 640 MB
tie my %db, 'BerkeleyDB::Hash', -Filename => $filename, -CacheSize => 640*1024*1024;
# Open database in readonly mode
tie my %db, 'BerkeleyDB::Hash', -Filename => $filename, -Flags => DB_RDONLY;
A more complex but much faster database library would be Tokyo Cabinet, and there are of course many other options (this is Perl after all...)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With