Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In perl, when assigning a subroutine's return value to a variable, is the data duplicated in memory?

sub foo {
    my @return_value = (1, 2);
}
my @receiver = foo();

Is this assigning like any other assigning in perl? the array is duplicated in memory? I doubt this cause of that since the array held by the subroutine is disposable, a duplication is totally redundant. it makes sense to just 'link' the array to @receiver for optimization reason.

by the way, I noticed a similar question Perl: function returns reference or copy? but didn't get what I want.

and I'm talking about Perl5

ps. any books or materials on such sort of topics about perl?

like image 929
nichijou Avatar asked Dec 08 '17 17:12

nichijou


2 Answers

The scalars returned by :lvalue subs aren't copied.

The scalars returned by XS subs aren't copied.

The scalars returned by function (named operators) aren't copied.

The scalars returned by other subs are copied.

But that's before any assignment comes into play. If you assign the returned values to a variable, you will be copying them (again, in the case of a normal Perl sub).

This means my $y = sub { $x }->(); copies $x twice!

But that doesn't really matter because of optimizations.


Let's start with an example of when they aren't copied.

$ perl -le'
    sub f :lvalue { my $x = 123; print \$x; $x }
    my $r = \f();
    print $r;
'
SCALAR(0x465eb48)  # $x
SCALAR(0x465eb48)  # The scalar on the stack

But if you remove :lvalue...

$ perl -le'
    sub f { my $x = 123; print \$x; $x }
    my $r = \f();
    print $r;
'
SCALAR(0x17d0918)  # $x
SCALAR(0x17b1ec0)  # The scalar on the stack

Worse, one usually follows up by assigning the scalar to a variable, so a second copy occurs.

$ perl -le'
    sub f { my $x = 123; print \$x; $x }
    my $r = \f();   # \
    print $r;       #  > my $y = f();
    my $y = $$r;    # /
    print \$y;
'
SCALAR(0x1802958)  # $x
SCALAR(0x17e3eb0)  # The scalar on the stack
SCALAR(0x18028f8)  # $y

On the plus side, assignment in optimized to minimize the cost of copying strings.

XS subs and functions (named operators) typically return mortal ("TEMP") scalars. These are scalars "on death row". They will be automatically destroyed if nothing steps in to claim a reference to them.

In older versions of Perl (<5.20), assigning a mortal string to another scalar will cause ownership of the string buffer to be transferred to avoid having to copy the string buffer. For example, my $y = lc($x); doesn't copy the string created by lc; simply the string pointer is copied.

$ perl -MDevel::Peek -e'my $s = "abc"; Dump($s); $s = lc($s); Dump($s);'
SV = PV(0x1705840) at 0x1723768
  REFCNT = 1
  FLAGS = (PADMY,POK,IsCOW,pPOK)
  PV = 0x172d4c0 "abc"\0
  CUR = 3
  LEN = 10
  COW_REFCNT = 1
SV = PV(0x1705840) at 0x1723768
  REFCNT = 1
  FLAGS = (PADMY,POK,pPOK)
  PV = 0x1730070 "abc"\0     <-- Note the change of address from stealing
  CUR = 3                        the buffer from the scalar returned by lc.
  LEN = 10

In newer versions of Perl (≥5.20), the assignment operator never[1] copies the string buffer. Instead, newer versions of Perl uses a copy-on-write ("COW") mechanism.

$ perl -MDevel::Peek -e'my $x = "abc"; my $y = $x; Dump($x); Dump($y);'
SV = PV(0x26b0530) at 0x26ce230
  REFCNT = 1
  FLAGS = (POK,IsCOW,pPOK)
  PV = 0x26d68a0 "abc"\0            <----+
  CUR = 3                                |
  LEN = 10                               |
  COW_REFCNT = 2                         +-- Same buffer (0x26d68a0)
SV = PV(0x26b05c0) at 0x26ce248          |
  REFCNT = 1                             |
  FLAGS = (POK,IsCOW,pPOK)               |
  PV = 0x26d68a0 "abc"\0            <----+
  CUR = 3
  LEN = 10
  COW_REFCNT = 2

Ok, so far, I've only talked about scalars. Well, that's because subs and functions can only return scalars[2].

In your example, the scalar assigned to @return_value would be returned[3], copied, then copied a second time into @receiver by the assignment.

You could avoid all of this by returning a reference to the array.

sub f { my @fizbobs = ...; \@fizbobs }
my $fizbobs = f();

The only thing copied there is a reference, the simplest non-undefined scalar.


  1. Ok, maybe not never. I think there needs to be a free byte in the string buffer to hold the COW count.

  2. In list context, they can return 0, 1 or many of them, but they can only return scalars.

  3. The last operator of your sub is a list assignment operator. In list context, the list assignment operator returns the scalars to which its left-hand side (LHS) evaluates. See Scalar vs List Assignment Operator for more info.

like image 111
ikegami Avatar answered Oct 06 '22 02:10

ikegami


The subroutine returns the result of the last operation if you don't specify an explicit return.

@return_value is created separately from @receiver and the values are copied and the memory used by @return_value is released when it goes out of scope at subroutine exit.

So yes - the memory used is duplicated.

If you desperately want to avoid this, you can create an anonymous array once, and 'pass' a reference to it around:

#!/usr/bin/env perl
use strict;
use warnings;

use Data::Dumper;

sub foo {
    my $anon_array_ref = [ 1, 2 ];
    return $anon_array_ref; 
}

my $results_from_foo = foo(); 

print Dumper $results_from_foo;

This will usually be premature optimisation though, unless you know you're dealing with really big data structures.

Note - you should probably include an explicit return; in your sub after the assignment, as it's good practice to make clear what you're doing.

like image 25
Sobrique Avatar answered Oct 06 '22 02:10

Sobrique