Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I cleanly turn a nested Perl hash into a non-nested one?

Tags:

perl

Assume a nested hash structure %old_hash ..

my %old_hash;
$old_hash{"foo"}{"bar"}{"zonk"} = "hello";

.. which we want to "flatten" (sorry if that's the wrong terminology!) to a non-nested hash using the sub &flatten(...) so that ..

my %h = &flatten(\%old_hash);
die unless($h{"zonk"} eq "hello");

The following definition of &flatten(...) does the trick:

sub flatten {
  my $hashref = shift;
  my %hash;
  my %i = %{$hashref};
  foreach my $ii (keys(%i)) {
    my %j = %{$i{$ii}};
    foreach my $jj (keys(%j)) {
      my %k = %{$j{$jj}};
      foreach my $kk (keys(%k)) {
        my $value = $k{$kk};
        $hash{$kk} = $value;
      }
    }
  }
  return %hash;
}

While the code given works it is not very readable or clean.

My question is two-fold:

  • In what ways does the given code not correspond to modern Perl best practices? Be harsh! :-)
  • How would you clean it up?
like image 330
knorv Avatar asked Mar 22 '10 20:03

knorv


4 Answers

Your method is not best practices because it doesn't scale. What if the nested hash is six, ten levels deep? The repetition should tell you that a recursive routine is probably what you need.

sub flatten {
    my ($in, $out) = @_;
    for my $key (keys %$in) {
        my $value = $in->{$key};
        if ( defined $value && ref $value eq 'HASH' ) {
            flatten($value, $out);
        }
        else {
            $out->{$key} = $value;
        }
    }
}

Alternatively, good modern Perl style is to use CPAN wherever possible. Data::Traverse would do what you need:

use Data::Traverse;
sub flatten {
    my %hash = @_;
    my %flattened;
    traverse { $flattened{$a} = $b } \%hash;
    return %flattened;
}

As a final note, it is usually more efficient to pass hashes by reference to avoid them being expanded out into lists and then turned into hashes again.

like image 145
rjh Avatar answered Nov 16 '22 20:11

rjh


First, I would use perl -c to make sure it compiles cleanly, which it does not. So, I'd add a trailing } to make it compile.

Then, I'd run it through perltidy to improve the code layout (indentation, etc.).

Then, I'd run perlcritic (in "harsh" mode) to automatically tell me what it thinks are bad practices. It complains that:

Subroutine does not end with "return"

Update: the OP essentially changed every line of code after I posted my Answer above, but I believe it still applies. It's not easy shooting at a moving target :)

like image 23
toolic Avatar answered Nov 16 '22 19:11

toolic


There are a few problems with your approach that you need to figure out. First off, what happens in the event that there are two leaf nodes with the same key? Does the second clobber the first, is the second ignored, should the output contain a list of them? Here is one approach. First we construct a flat list of key value pairs using a recursive function to deal with other hash depths:

my %data = (
    foo  => {bar  => {baz  => 'hello'}},
    fizz => {buzz => {bing => 'world'}},
    fad  => {bad  => {baz  => 'clobber'}},
);


sub flatten {
    my $hash = shift;
    map {
        my  $value = $$hash{$_};
        ref $value eq 'HASH' 
            ? flatten($value) 
            : ($_ =>  $value)
    } keys %$hash
}

print join( ", " => flatten \%data), "\n";
# baz, clobber, bing, world, baz, hello

my %flat = flatten \%data;

print join( ", " => %flat ), "\n";
# baz, hello, bing, world          # lost (baz => clobber)

A fix could be something like this, which will create a hash of array refs containing all the values:

sub merge {
    my %out;
    while (@_) {
        my ($key, $value) = splice @_, 0, 2;
        push @{ $out{$key} }, $value
    }
    %out
}

my %better_flat = merge flatten \%data;

In production code, it would be faster to pass references between the functions, but I have omitted that here for clarity.

like image 2
Eric Strom Avatar answered Nov 16 '22 19:11

Eric Strom


Is it your intent to end up with a copy of the original hash or just a reordered result?

Your code starts with one hash (the original hash that is used by reference) and makes two copies %i and %hash.

The statement my %i=%{hashref} is not necessary. You are copying the entire hash to a new hash. In either case (whether you want a copy of not) you can use references to the original hash.

You are also losing data if your hash in the hash has the same value as the parent hash. Is this intended?

like image 1
dawg Avatar answered Nov 16 '22 20:11

dawg