Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Confusion about proper usage of dereference in Perl

I noticed the other day that - while altering values in a hash - that when you dereference a hash in Perl, you actually are making a copy of that hash. To confirm I wrote this quick little script:

#! perl
use warnings;
use strict;

my %h = ();
my $hRef = \%h;
my %h2 = %{$hRef};
my $h2Ref = \%h2;

if($hRef eq $h2Ref) {
  print "\n\tThey're the same $hRef $h2Ref";
}
else {
  print "\n\tThey're NOT the same $hRef $h2Ref";
}
print "\n\n";

The output:

    They're NOT the same HASH(0x10ff6848) HASH(0x10fede18)

This leads me to realize that there could be spots in some of my scripts where they aren't behaving as expected. Why is it even like this in the first place? If you're passing or returning a hash, it would be more natural to assume that dereferencing the hash would allow me to alter the values of the hash being dereferenced. Instead I'm just making copies all over the place without any real need/reason to beyond making syntax a little more obvious.

I realize the fact that I hadn't even noticed this until now shows its probably not that big of a deal (in terms of the need to go fix in all of my scripts - but important going forward). I think its going to be pretty rare to see noticeable performance differences out of this, but that doesn't alter the fact that I'm still confused.

Is this by design in perl? Is there some explicit reason I don't know about for this; or is this just known and you - as the programmer - expected to know and write scripts accordingly?

like image 330
Dave Avatar asked Jul 18 '11 22:07

Dave


3 Answers

The problem is that you are making a copy of the hash to work with in this line:

my %h2 = %{$hRef};

And that is understandable, since many posts here on SO use that idiom to make a local name for a hash, without explaining that it is actually making a copy.

In Perl, a hash is a plural value, just like an array. This means that in list context (such as you get when assigning to a hash) the aggregate is taken apart into a list of its contents. This list of pairs is then assembled into a new hash as shown.

What you want to do is work with the reference directly.

for (keys %$hRef) {...}
for (values %$href) {...}

my $x = $href->{some_key};
# or
my $x = $$href{some_key};

$$href{new_key} = 'new_value';

When working with a normal hash, you have the sigil which is either a % when talking about the entire hash, a $ when talking about a single element, and @ when talking about a slice. Each of these sigils is then followed by an identifier.

 %hash          # whole hash
 $hash{key}     # element
 @hash{qw(a b)} # slice

To work with a reference named $href simply replace the string hash in the above code with $href. In other words, $href is the complete name of the identifier:

%$href          # whole hash
$$href{key}     # element
@$href{qw(a b)} # slice

Each of these could be written in a more verbose form as:

%{$href}
${$href}{key}
@{$href}{qw(a b)}

Which is again a substitution of the string '$href' for 'hash' as the name of the identifier.

%{hash}
${hash}{key}
@{hash}{qw(a b)} 

You can also use a dereferencing arrow when working with an element:

$hash->{key}  # exactly the same as $$hash{key}

But I prefer the doubled sigil syntax since it is similar to the whole aggregate and slice syntax, as well as the normal non-reference syntax.

So to sum up, any time you write something like this:

my @array = @$array_ref;
my %hash  = %$hash_ref;

You will be making a copy of the first level of each aggregate. When using the dereferencing syntax directly, you will be working on the actual values, and not a copy.


If you want a REAL local name for a hash, but want to work on the same hash, you can use the local keyword to create an alias.

 sub some_sub {
    my $hash_ref = shift;
    our %hash; # declare a lexical name for the global %{__PACKAGE__::hash}
    local *hash = \%$hash_ref;
        # install the hash ref into the glob
        # the `\%` bit ensures we have a hash ref

    # use %hash here, all changes will be made to $hash_ref

 }  # local unwinds here, restoring the global to its previous value if any

That is the pure Perl way of aliasing. If you want to use a my variable to hold the alias, you can use the module Data::Alias

like image 112
Eric Strom Avatar answered Nov 14 '22 23:11

Eric Strom


You are confusing the actions of dereferencing, which does not inherently create a copy, and using a hash in list context and assigning that list, which does. $hashref->{'a'} is a dereference, but most certainly does affect the original hash. This is true for $#$arrayref or values(%$hashref) also.

Without the assignment, just the list context %$hashref is a mixed beast; the resulting list contains copies of the hash keys but aliases to the actual hash values. You can see this in action:

$ perl -wle'$x={"a".."f"}; for (%$x) { $_=chr(ord($_)+10) }; print %$x'
epcnal

vs.

$ perl -wle'$x={"a".."f"}; %y=%$x; for (%y) { $_=chr(ord($_)+10) }; print %$x; print %y'
efcdab
epcnal

but %$hashref isn't acting any differently than %hash here.

like image 39
ysth Avatar answered Nov 15 '22 01:11

ysth


No, dereferencing does not create a copy of the referent. It's my that creates a new variable.

$ perl -E'
   my %h1; my $h1 = \%h1;
   my %h2; my $h2 = \%h2;
   say $h1;
   say $h2;
   say $h1 == $h2 ?1:0;
'
HASH(0x83b62e0)
HASH(0x83b6340)
0

$ perl -E'
   my %h;
   my $h1 = \%h;
   my $h2 = \%h;
   say $h1;
   say $h2;
   say $h1 == $h2 ?1:0;
'
HASH(0x9eae2d8)
HASH(0x9eae2d8)
1

No, $#{$someArrayHashRef} does not create a new array.

like image 45
ikegami Avatar answered Nov 15 '22 00:11

ikegami