Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can BerkeleyDB in perl handle a hash of hashes of hashes (up to n)?

I have a script that utilizes a hash, which contains four strings as keys whose values are hashes. These hashes also contain four strings as keys which also have hashes as their values. This pattern continues up to n-1 levels, which is determined at run-time. The nth-level of hashes contain integer (as opposed to the usual hash-reference) values.

I installed the BerkeleyDB module for Perl so I can use disk space instead of RAM to store this hash. I assumed that I could simply tie the hash to a database, and it would work, so I added the following to my code:

my %tags = () ; 
my $file = "db_tags.db" ; 
unlink $file; 


tie %tags, "BerkeleyDB::Hash", 
        -Filename => $file, 
        -Flags => DB_CREATE
     or die "Cannot open $file\n" ;

However, I get the error:

Can't use string ("HASH(0x1a69ad8)") as a HASH ref while "strict refs" in use at getUniqSubTreeBDB.pl line 31, line 1.

To test, I created a new script, with the code (above) that tied to hash to a file. Then I added the following:

my $href = \%tags; 
$tags{'C'} = {} ;

And it ran fine. Then I added:

$tags{'C'}->{'G'} = {} ;

And it would give pretty much the same error. I am thinking that BerkeleyDB cannot handle the type of data structure I am creating. Maybe it was able to handle the first level (C->{}) in my test because it was just a regular key -> scaler?

Anyways, any suggestions or affirmations of my hypothesis would be appreciated.

like image 316
gravitas Avatar asked Mar 21 '12 16:03

gravitas


3 Answers

Use DBM::Deep.

my $db = DBM::Deep->new( "foo.db" );

$db->{mykey} = "myvalue";
$db->{myhash} = {};
$db->{myhash}->{subkey} = "subvalue";

print $db->{myhash}->{subkey} . "\n";

The code I provided yesterday would work fine with this.

sub get_node {
   my $p = \shift;
   $p = \( ($$p)->{$_} ) for @_;
   return $p;
}

my @seqs = qw( CG CA TT CG );

my $tree = DBM::Deep->new("foo.db");
++${ get_node($tree, split //) } for @seqs;
like image 112
ikegami Avatar answered Oct 30 '22 05:10

ikegami


No. BerkeleyDB stores pairs of one key and one value, where both are arbitrary bytestrings. If you store a hashref as the value, it'll store the string representation of a hashref, which isn't very useful when you read it back (as you noticed).

The MLDBM module can do something like you describe, but it works by serializing the top-level hashref to a string and storing that in the DBM file. This means it has to read/write the entire top-level hashref every time you access or change a value in it.

Depending on your application, you may be able to combine your keys into a single string, and use that as the key for your DBM file. The main limitation with that is that it's difficult to iterate over the keys of one of your interior hashes.

You might use the semi-obsolete multidimensional array emulation for this. $foo{$a,$b,$c} is interpreted as $foo{join($;, $a, $b, $c)}, and that works with tied hashes also.

like image 33
cjm Avatar answered Oct 30 '22 04:10

cjm


No; it can only store strings. But you can use the →filter_fetch_value and →filter_store_value to define "filters" that will automatically freeze arbitrary structures to strings before storing, and to convert back when fetching. There are analogous hooks for marshalling and unmarshalling non-string keys.

Beware though: using this method to store objects that share subobjects will not preserve the sharing. For example:

$a = [1, 2, 3];
$g = { array => $a };
$h = { array => $a };
$db{g} = $g;
$db{h} = $h;

@$a = ();
push @{$db{g}{array}}, 4;

print @{$db{g}{array}};  # prints 1234, not 4
print @{$db{h}{array}};  # prints 123, not 1234 or 4

%db here is a tied hash; if it were an ordinary hash the two prints would both print 4.

like image 2
Mark Dominus Avatar answered Oct 30 '22 05:10

Mark Dominus