Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to code a sub with the same signature as sort or reduce in List::Util

Tags:

perl

I would like to be able to wtite a function that I can use like this:

my @grouped = split_at {
      $a->{time}->strftime("%d-%b-%Y %H") ne 
      $b->{time}->strftime("%d-%b-%Y %H") 
} @files;

where split_at splits an array into an array of arrayrefs based on a function, and looks like this:

sub split_at(&@) {
# split an array into an arrayrefs based on a function
    my $cb = shift;

    my @input = @_;

    my @retval = ( [  ] );

    $i = 0;
    while ($i <= $#input) {
        push @{$retval[$#retval]}, $input[$i];
        $i++;
        if (($i < $#input) && $cb->($input[$i-1], $input[$i])) { push @retval, [] }
    }
    pop @retval unless @{$retval[$#retval]};
    return @retval;
}

For now I can only call it like this:

my @grouped = split_at { 
          $_[0]->{time}->strftime("%d-%b-%Y %H") ne 
          $_[1]->{time}->strftime("%d-%b-%Y %H") 
} @files;

where this batches files by the mtime hour using Time::Piece.

I'm trying to find out what's the way to be able to call it as (simplified):

my @foo = split_at { $a <=> $b } @foo;

in a similar way as sort or List::Util::reduce

I have checked the code to reduce in List::Util::PP for reference but I don't understand it enough to port it to my case.

like image 575
simone Avatar asked Feb 04 '21 10:02

simone


1 Answers

The main thing you need to do is assign the values of your lexical first and second value into the caller's namespace as $a and $b. reduce does this with one value:

#     /- 1
#     |  /-3
#     |  |             /-4 
# 0   |  |             |
local *{"${pkg}::a"} = \$a;
#      \-----------/
#             2

Let's quickly look at this:

  1. local overrides a global variable in this scope and all scopes that are contained within it temporarily. So when we're calling the callback, that variable will have a different value. Remember that $a and $b are special globals.

  2. It uses a glob * to assign into the symbol table in that namespace, and it's smart enough to find the right slot in the symbol table. Think of a chest of drawers with one drawer for scalar, one for array, one for hash and so on. You might have seen this syntax for installing subs too.

  3. The {""} syntax allows to build a variable name up out of multiple pieces and other variables. We're using a package and the variable name a to get, for example, main::a. This requires strict 'refs' to be turned off, or perl will complain, so there's a no strict 'refs' higher up in the code. The * from 1 indicates that we're using this var name as type glob.

  4. This is similar to 1, but with the scalar slot this time. In this case, we wouldn't have had to disable strict references because it's just a plain string, and Perl considers that safe. That's the syntax to tell the interpreter that a variable name ends. Compare these two:

    my $foo, $foo_bar;
    "$foo_bar_baz";    # variable doesn't exist error
    "${foo}_bar_baz";  # prints value of $foo and string _bar_baz
    "${foo_bar}_baz";  # prints value of $foo_bar and string _baz
    

    We need this so that we don't get the value of a $a in the package pkg.

  5. We assign a reference to our lexical $a to that slot in the type glob from 1. This is going to be a scalar reference. Essentially the variable $a for us now has another name $pkg::a. It's similar to importing a sub into your namespace when you say use Foo 'bar'.

Having looked at this, we can update your code.

sub split_at(&@) {
    my $cb = shift;
    my @input = @_;

    my @retval = [];

    my $pkg = caller;
    my $max = $#input;

    for my $i (0 .. $max) {
        push @{$retval[$#retval]}, $input[$i];
        no strict 'refs';
        local *{"${pkg}::a"} = \$input[$i];
        local *{"${pkg}::b"} = \$input[$i + 1]; # autovivification!
        if (($i < $max) && $cb->()) {
            push @retval, [];
        }
    }
    pop @retval unless @{$retval[$#retval]};
    return @retval;
}


my @files = map { { foo => $_ } } qw/ a a b b c d /;

my @grouped = split_at {
          $a->{foo} ne
          $b->{foo}
} @files;

I've simplified the data structure for our example.

We need both $a and $b for every iteration of the loop. Because assigning the next element to the one we're looking at to $b via the type glob causes autovivification I had to change your loop type to a counter and introduce a new variable $max. I ran into an endless loop while building this as it kept putting undef elements at the end of @input.

Apart from that, it's pretty much the same code. We don't need to pass arguments to the callback any more, and we need to have no strict 'refs'. You typically turn strict off for as small a scope as possible. We also need to take the caller's namespace so we can put variables there.

There is one more thing you need to look out for. List::Util::PP sets up $a and $b in the caller's namespace because we might cause warnings otherwise, so you should probably have the same code it uses if you want to put this function into a library.

sub import {
  my $pkg = caller;
 
  # (RT88848) Touch the caller's $a and $b, to avoid the warning of
  #   Name "main::a" used only once: possible typo" warning
  no strict 'refs';
  ${"${pkg}::a"} = ${"${pkg}::a"};
  ${"${pkg}::b"} = ${"${pkg}::b"};
 
  goto &Exporter::import;
}
like image 98
simbabque Avatar answered Nov 15 '22 09:11

simbabque