Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pass by value vs pass by reference for a Perl hash

I'm using a subroutine to make a few different hash maps. I'm currently passing the hashmap by reference, but this conflicts when doing it multiple times. Should I be passing the hash by value or passing the hash reference?

use strict;
use warnings;

sub fromFile($){
    local $/;
    local our %counts =();
     my $string = <$_[0]>;
    open FILE, $string or die $!;
    my $contents = <FILE>;
    close FILE or die $!;

    my $pa = qr{
        ( \pL {2} )
        (?{
            if(exists $counts{lc($^N)}){
                $counts{lc($^N)} = $counts{lc($^N)} + 1;
            }
            else{
                $counts{lc($^N)} = '1';
            }
        })
        (*FAIL)
    }x;

     $contents =~ $pa;

    return %counts;

}

sub main(){
    my %english_map = &fromFile("english.txt");
    #my %german_map = &fromFile("german.txt");
}

main();

When I run the different txt files individually I get no problems, but with both I get some conflicts.

like image 608
Johann Avatar asked Mar 09 '13 22:03

Johann


People also ask

Is Perl pass by value or pass by reference?

Perl always passes by reference. It's just that sometimes the caller passes temporary scalars. Perl passes by reference. Specifically, Perl aliases each of the arguments to the elements of @_ .

What is a hash reference in Perl?

A hash is a basic data type in Perl. It uses keys to access its contents. A hash ref is an abbreviation to a reference to a hash. References are scalars, that is simple values. It is a scalar value that contains essentially, a pointer to the actual hash itself.

How do I pass a variable by reference in Perl?

To create a reference from an array or hash simply precede the variable's name with a backslash (\). To restore a reference back to the object to which it refers, precede the reference's name with the appropriate symbol, i.e. an at sign ( @ ) for arrays, or a percent sign ( \% ) for a hash.


3 Answers

Three comments:

Don't confuse passing a reference with passing by reference

Passing a reference is passing a scalar containing a reference (a type of value).

The compiler passes an argument by reference when it passes the argument without making a copy.

The compiler passes an argument by value when it passes a copy of the argument.

Arguments are always passed by reference in Perl

Modifying a function's parameters (the elements of @_) will change the corresponding variable in the caller. That's one of the reason the convention to copy the parameters exists.

my ($x, $y) = @_;   # This copies the args.

Of course, the primary reason for copying the parameters is to "name" them, but it saves us from some nasty surprises we'd get by using the elements of @_ directly.

$ perl -E'sub f { my ($x) = @_; "b"=~/(.)/; say $x;    } "a"=~/(.)/; f($1)'
a

$ perl -E'sub f {               "b"=~/(.)/; say $_[0]; } "a"=~/(.)/; f($1)'
b

One cannot pass an array or hash as an argument in Perl

The only thing that can be passed to a Perl sub is a list of scalars. (It's also the only thing that can be returned by one.)

Since @a evaluates to $a[0], $a[1], ... in list context,

foo(@a)

is the same as

foo($a[0], $a[1], ...)

That's why we create a reference to the array or hash we want to pass to a sub and pass the reference.

If we didn't, the array or hash would be evaluated into a list of scalars, and it would have to be reconstructed inside the sub. Not only is that expensive, it's impossible in cases like

foo(@a, @b)

because foo has no way to know how many arguments were returned by @a and how many were returned by @b.

Note that it's possible to make it look like an array or hash is being passed as an argument using prototypes, but the prototype just causes a reference to the array/hash to be created automatically, and that's what actually passed to the sub.

like image 116
ikegami Avatar answered Oct 05 '22 04:10

ikegami


For a couple of reasons you should use pass-by-reference, but the code you show returns the hash by value.

  • You should use my rather than local except for built-in variables like $/, and then for only as small a scope as possible.

  • Prototypes on subroutines are almost never a good idea. They do something very specific, and if you don't know what that is you shouldn't use them.

  • Calling subroutines using the ampersand sigil, as in &fromFile("english.txt"), hasn't been correct since Perl 4, about twenty years ago. It affects the parameters delivered to a subroutine in at least two different ways and is a bad idea.

  • I'm not sure why you are using a file glob with my $string = <$_[0]>. Are you expecting wildcards in the filename passed as the parameter? If so then you will be opening and reading only the first matching file, otherwise the glob is unnecessary.

  • Lexical file handles like $fh are better than bareword file handles like FILE, and will be closed implicitly when they are destroyed - usually at the end of the block where they are declared.

  • I am not sure how your hash %counts gets populated. No regex on its own can fill a hash, but I will have to trust you!

Try this version. People familiar with Perl will thank you (ironically!) for not using camel-case variable names. And it is rare to see a main subroutine declared and called. That is C, this is Perl.

Update I have changed this code to do what your original regex did.

use strict;
use warnings;

sub from_file {

    my ($filename) = @_;

    my $contents = do {
        open my $fh, '<', $filename or die qq{Unable to open "$filename": $!};
        local $/;
        my $contents = <$fh>;
    };

    my %counts;
    $counts{lc $1}++ while $contents =~ /(?=(\pL{2}))/g;

    return \%counts;
}

sub main {
    my $english_map = from_file('english.txt');
    my $german_map  = from_file('german.txt');
}

main();
like image 42
Borodin Avatar answered Oct 05 '22 03:10

Borodin


You can use either a reference or pass the entire hash or array. Your choice. There are two issues that might make you choose one over the other:

  1. Passing other parameters
  2. Memory Management

Perl doesn't really have subroutine parameters. Instead, you're simply passing in an array of parameters. What if you're subroutine is seeing which array has more elements. I couldn't do this:

foo(@first, @second);

because all I'll be passing in is one big array that combines all the members of both. This is true with hashes too. Imagine a program that takes two hashes and finds the ones with common keys:

@common_keys = common(%hash1, %hash1);

Again, I'm combining all the keys and their values in both hashes into one big array.

The only way around this issue is to pass a reference:

foo(\@first, \@second);
@common_keys = common(\%hash1, \%hash2);

In this case, I'm passing the memory location where these two hashes are stored in memory. My subroutine can use those hash references. However, you do have to take some care which I'll explain with the second explanation.

The second reason to pass a reference is memory management. If my array or hash is a few dozen entries, it really doesn't matter all that much. However, imagine I have 10,000,000 entries in my hash or array. Copying all those members could take quite a bit of time. Passing by reference saves me memory, but with a terrible cost. Most of the time, I'm using subroutines as a way of not affecting my main program. This is why subroutines are suppose to use their own variables and why you're taught in most programming courses about variable scope.

However, when I pass a reference, I'm breaking that scope. Here's a simple program that doesn't pass a reference.

#! /usr/bin/env perl
use strict;
use warnings;

my @array = qw(this that the other);

foo (@array);

print join ( ":", @array ) . "\n";

sub foo {
    my @foo_array = @_;
    $foo_array[1] = "FOO";
}

Note that the subroutine foo1 is changing the second element of the passed in array. However, even though I pass in @array into foo, the subroutine doesn't change the value of @array. That's because the subroutine is working on a copy (created by my @foo_array = @_;). Once the subroutine exists, the copy disappears.

When I execute this program, I get:

this:that:the:other

Now, here's the same program, except I'm passing in a reference, and in the interest of memory management, I use that reference:

#! /usr/bin/env perl
use strict;
use warnings;

my @array = qw(this that the other);

foo (\@array);

print join ( ":", @array ) . "\n";

sub foo {
    my $foo_array_ref = shift;
    $foo_array_ref->[1] = "FOO";
}

When I execute this program, I get:

this:FOO:the:other

That's because I don't pass in the array, but a reference to that array. It's the same memory location that holds @array. Thus, changing the reference in my subroutine causes it to be changed in my main program. Most of the time, you do not want to do this.

You can get around this by passing in a reference, then copying that reference to an array. For example, if I had done this:

sub foo {
    my @foo_array = @{ shift() };

I would be making a copy of my reference to another array. It protects my variables, but it does mean I'm copying my array over to another object which takes time and memory. Back in the 1980s when I first was programming, this was a big issue. However, in this age of gigabyte memory and quadcore processors, the main issue isn't memory management, but maintainability. Even if your array or hash contained 10 million entries, you'll probably not notice any time or memory issues.

This also works the other way around too. I could return from my subroutine a reference to a hash or the entire hash. Many people like returning a reference, but this could be problematic.

In object oriented Perl programming, I use references to keep track of my objects. Normally, I'll have a reference to a hash I can use to store other values, arrays, and hashes.

In a recent program, I was counting IDs and how many times they are referenced in a log file. This was stored in an object (which is just a reference to a hash). I had a method that would return the entire hash of IDs and their counts. I could have done this:

return $self->{COUNT_HASH};

But, what happened, if the user started modifying that reference I passed? They would be actually manipulating my object without using my methods to add and subtract from the IDs. Not something that I want them to do. Instead, I create a new hash, and then return a reference to that hash:

my %hash_counts = % { $self-{COUNT_HASH} };
return \%hash_count;

This copied my reference to an array, and then I passed the reference to the array. This protects my data from outside manipulation. I could still return a reference, but the user would no longer have access to my object without going through my methods.

By the way, I like using wantarray which gives the caller a choice on how they want their data:

my %hash_counts = %{ $self->{COUNT_HASH} };
return want array ? %hash_counts : \%hash_counts;

This allows me to return a reference or a hash depending how the user called my object:

my %hash_counts = $object->totals();      # Returns a hash
my $hash_counts_ref = $object->totals();  # Returns a reference to a hash

1 A footnote: The @_ array is pointing to the same memory location as the parameters of your calling subroutine. Thus, if I pass in foo(@array) and then did $_[1] = "foo";, I would be changing the second element of @array.

like image 24
David W. Avatar answered Oct 05 '22 04:10

David W.