Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I merge several hashes into one hash in Perl?

In Perl, how do I get this:

$VAR1 = { '999' => { '998' => [ '908', '906', '0', '998', '907' ] } }; 
$VAR1 = { '999' => { '991' => [ '913', '920', '918', '998', '916', '919', '917', '915', '912', '914' ] } }; 
$VAR1 = { '999' => { '996' => [] } }; 
$VAR1 = { '999' => { '995' => [] } }; 
$VAR1 = { '999' => { '994' => [] } }; 
$VAR1 = { '999' => { '993' => [] } }; 
$VAR1 = { '999' => { '997' => [ '986', '987', '990', '984', '989', '988' ] } }; 
$VAR1 = { '995' => { '101' => [] } }; 
$VAR1 = { '995' => { '102' => [] } }; 
$VAR1 = { '995' => { '103' => [] } }; 
$VAR1 = { '995' => { '104' => [] } }; 
$VAR1 = { '995' => { '105' => [] } }; 
$VAR1 = { '995' => { '106' => [] } }; 
$VAR1 = { '995' => { '107' => [] } }; 
$VAR1 = { '994' => { '910' => [] } }; 
$VAR1 = { '993' => { '909' => [] } }; 
$VAR1 = { '993' => { '904' => [] } }; 
$VAR1 = { '994' => { '985' => [] } }; 
$VAR1 = { '994' => { '983' => [] } }; 
$VAR1 = { '993' => { '902' => [] } }; 
$VAR1 = { '999' => { '992' => [ '905' ] } }; 

to this:

$VAR1 = { '999:' => [
 { '992' => [ '905' ] },
 { '993' => [
  { '909' => [] },
  { '904' => [] },
  { '902' => [] }
 ] },
 { '994' => [
  { '910' => [] },
  { '985' => [] },
  { '983' => [] }
 ] },
 { '995' => [
  { '101' => [] },
  { '102' => [] },
  { '103' => [] },
  { '104' => [] },
  { '105' => [] },
  { '106' => [] },
  { '107' => [] }
 ] },
 { '996' => [] },
 { '997' => [ '986', '987', '990', '984', '989', '988' ] },
 { '998' => [ '908', '906', '0', '998', '907' ] },
 { '991' => [ '913', '920', '918', '998', '916', '919', '917', '915', '912', '914' ] }
]};
like image 952
Nick Avatar asked May 04 '10 17:05

Nick


3 Answers

I think this is closer than anybody else has gotten:

This does most of what you want. I did not store things in arrays of singular hashes, as I don't feel that that is useful.

Your scenario is not a regular one. I've tried to genericize this to some extent, but was not possible to overcome the singularity of this code.

  • First of all because it appears you want to collapse everything with the same id into a merged entity (with exceptions), you have to descend through the structure pulling the definitions of the entities. Keeping track of levels, because you want them in the form of a tree.

  • Next, you assemble the ID table, merging entities as possible. Note that you had 995 defined as an empty array one place and as a level another. So given your output, I wanted to overwrite the empty list with the hash.

  • After that, we need to move the root to the result structure, descending that in order to assign canonical entities to the identifiers at each level.

Like I said, it's not anything that regular. Of course, if you still want a list of hashes which are no more than pairs, that's an exercise left to you.

use strict;
use warnings;

# subroutine to identify all elements
sub descend_identify {
    my ( $level, $hash_ref ) = @_;
    # return an expanding list that gets populated as we desecend 
    return map {
        my $item = $hash_ref->{$_};
        $_ => ( $level, $item )
            , ( ref( $item ) eq 'HASH' ? descend_identify( $level + 1, $item ) 
              :                          ()
              )
           ;
    } keys %$hash_ref
    ;
}

# subroutine to refit all nested elements
sub descend_restore { 
    my ( $hash, $ident_hash ) = @_;

    my @keys        = keys %$hash;
    @$hash{ @keys } = @$ident_hash{ @keys };
    foreach my $h ( grep { ref() eq 'HASH' } values %$hash ) {
        descend_restore( $h, $ident_hash );
    }
    return;
}

# merge hashes, descending down the hash structures.
sub merge_hashes {
    my ( $dest_hash, $src_hash ) = @_;
    foreach my $key ( keys %$src_hash ) {
        if ( exists $dest_hash->{$key} ) {
            my $ref = $dest_hash->{$key};
            my $typ = ref( $ref );
            if ( $typ eq 'HASH' ) {
                merge_hashes( $ref, $src_hash->{$key} );
            }
            else { 
                push @$ref, $src_hash->{$key};
            }
        }
        else {
            $dest_hash->{$key} = $src_hash->{$key};
        }
    }
    return;
}

my ( %levels, %ident_map, %result );

#descend through every level of hash in the list
# @hash_list is assumed to be whatever you Dumper-ed.
my @pairs = map { descend_identify( 0, $_ ); } @hash_list;

while ( @pairs ) {
    my ( $key, $level, $ref ) = splice( @pairs, 0, 3 );
    $levels{$key} |= $level;

    # if we already have an identity for this key, merge the two
    if ( exists $ident_map{$key} ) {
        my $oref = $ident_map{$key};
        my $otyp = ref( $oref );
        if ( $otyp ne ref( $ref )) {
            # empty arrays can be overwritten by hashrefs -- per 995
            if ( $otyp eq 'ARRAY' && @$oref == 0 && ref( $ref ) eq 'HASH' ) {
                $ident_map{$key} = $ref;
            }
            else { 
                die "Uncertain merge for '$key'!";
            }
        }
        elsif ( $otyp eq 'HASH' ) {
            merge_hashes( $oref, $ref );
        }
        else {
            @$oref = sort { $a <=> $b || $a cmp $b } keys %{{ @$ref, @$oref }};
        }
    }
    else {
        $ident_map{$key} = $ref;
    }
}

# Copy only the keys that do not appear at higher levels to the 
# result hash
if ( my @keys = grep { !$levels{$_} } keys %ident_map ) { 
    @result{ @keys } = @ident_map{ @keys } if @keys;

}
# then step through the hash to make sure that the entries at
# all levels are equal to the identity
descend_restore( \%result, \%ident_map );
like image 189
Axeman Avatar answered Nov 06 '22 13:11

Axeman


Use CPAN! Try Hash::Merge

# OO interface.  
my $merge = Hash::Merge->new( 'LEFT_PRECEDENT' );
my %c = %{ $merge->merge( \%a, \%b ) };

See CPAN for more info, it pretty much does everything you would want to, and is fully customizable.

like image 34
NO WAR WITH RUSSIA Avatar answered Nov 06 '22 12:11

NO WAR WITH RUSSIA


Give this recursive solution a try:

#   XXX: doesn't handle circular problems...
sub deepmerge {
    my (@structs) = @_;
    my $new;

    # filter out non-existant structs
    @structs = grep {defined($_)} @structs;

    my $ref = ref($structs[0]);
    if (not all(map {ref($_) eq $ref} @structs)) { 
        warn("deepmerge: all structs are not $ref\n");
    } 

    my @tomerge = grep {ref($_) eq $ref} @structs;
    return qr/$tomerge[0]/ if scalar(@tomerge) == 1 and $ref eq 'Regexp';
    return $tomerge[0] if scalar(@tomerge) == 1;

    if ($ref eq '') { 
        $new = pop(@tomerge); # prefer farthest right
    } 
    elsif ($ref eq 'Regexp') { 
        $new = qr/$tomerge[$#tomerge]/;
    } 
    elsif ($ref eq 'ARRAY') { 
        $new = [];
        for my $i (0 .. max(map {scalar(@$_) - 1} @tomerge)) { 
            $new->[$i] = deepmerge(map {$_->[$i]} @tomerge);
        }
    } 
    elsif ($ref eq 'HASH') { 
        $new = {};
        for my $key (uniq(map {keys %$_} @tomerge)) { 
            $new->{$key} = deepmerge(map {$_->{$key}} @tomerge);
        }
    }
    else {
        # ignore all other structures...
        $new = '';
    }

    return $new;
}

Modify it to your hearts content to achieve the desired result.

Upon further investigation, I noticed you're merging them in some different way than the above algorithm. Maybe just use this as an example then. Mine does this:

deepmerge({k => 'v'}, {k2 => 'v2'});
# returns {k => 'v', k2 => 'v2'}

And similar things for arrays.

like image 20
dlamotte Avatar answered Nov 06 '22 13:11

dlamotte