Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Passing large values to modules

I had a program that was slow, and was trying to improve performance. The script "use"s a sub in a module, and passes an array that's quite large to the sub. After some tinkering, I realized if I moved the sub directly into the parent script, and made the array global instead of local (so I didn't have to pass it), the script was massively faster (running in minutes where it was taking days).

I'd really like to be able to have that sub in the module (because I have many scripts that call that same sub). But I'd also like it to be fast. :-)

Semi-pseudocode

page.pl:

package Page;

use Star;
my @fileBytes=();
open(StarFile, "<$File");
binmode(StarFile);
while (read(StarFile, $FileValues, 1)) {
  push @fileBytes, $FileValues; 
}
close(StarFile);

&parseBlock(\@fileBytes);

Module.pl:

package Star;

sub parseBlock {
  my ($fileBytes) = @_;
  my @fileBytes = @{ $fileBytes };

  ...
}

Some reading here: https://www.perlmonks.org/?node=Variable%20Scoping%20in%20Perl%3A%20the%20basics tells me I want to deal with scoping. So if I define @fileBytes with "our" instead of "my" it becomes a package value. As best I can tell, that would normally be in the module file. But i'm starting with the value in the parent.

So I can make the parent also a package, define: our @fileBytes

and then reference it from the module as at least something like: @Page::fileBytes

I think I have that at least right in theory.

My problem appears when I want to use the sub from a different script:

other.pl:

package Other;
use Star;

  my @fileBytes=();
  open(StarFile, "<$File");
  binmode(StarFile);
  while (read(StarFile, $FileValues, 1)) {
    push @fileBytes, $FileValues; 
  }
  close(StarFile);

&parseBlock(\@fileBytes, $offset);

Now the value I'm passing is @Other::fileBytes . That problem expands the more I use my library.

What i'd like to be able to do is have the subroutine in the module, but not have to pass (which I believe is creating a new value, which must be slow) the @fileBytes data because it's "global", in such a way as I can use the centralized sub.

like image 240
Rick S Avatar asked Mar 02 '23 22:03

Rick S


1 Answers

You can't pass arrays to subs, only scalars. When one uses f(@a), one is passing the elements of the array. This doesn't create any new scalars or make copies of any of the scalars, so it's actually pretty fast nonetheless.

However, even that small cost can be avoided. This is done by passing a reference to an array: f(\@a). This does create a single scalar, but it's the lightest of them all.

This is what you're already doing, so from the point of view of calling the sub, you already have the fastest. The problem you are facing is the result of what you do immediately after the sub is called: You create a new array and copy every element of the provided array into this new array.

my @fileBytes = @{ $fileBytes };  # Copies every element.

Remove that line, and your problem is fixed. Of course, you'll need to change any code that used the duplicate array (@fileBytes) to use the original array (@$fileBytes) instead. The only caveat is that any changes to the array will be reflected in the array passed via reference to the sub, since it's the same array.


Alternative Solution

If you insist on avoiding working with references, you can use the following:

use experimental qw( declared_refs );

my \@fileBytes = $fileBytes;

The effectively makes @fileBytes an alias for @$fileBytes. No copying is involved. It's not free, but it's not expensive either (O(1)). Just as with modifying @$filesBytes directly, modifying @fileBytes will affect the array in the caller.

Generally speaking, one should avoid experimental code in production, but the devs are planning to enable it by default in the next major version of Perl, so they surely consider it quite stable.

like image 171
ikegami Avatar answered Mar 05 '23 17:03

ikegami