Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I partition a Perl array into equal sized chunks?

Tags:

perl

I have a fixed-sized array where the size of the array is always in factor of 3.

my @array = ('foo', 'bar', 'qux', 'foo1', 'bar', 'qux2', 3, 4, 5);

How can I cluster the member of array such that we can get an array of array group by 3:

$VAR = [ ['foo','bar','qux'],
         ['foo1','bar','qux2'],
         [3, 4, 5] ];
like image 816
neversaint Avatar asked Sep 29 '09 06:09

neversaint


People also ask

What function split an array into chunks?

The array_chunk() function splits an array into chunks of new arrays.

What is Perl splice?

In Perl, the splice() function is used to remove and return a certain number of elements from an array. A list of elements can be inserted in place of the removed elements. Syntax: splice(@array, offset, length, replacement_list) Parameters: @array – The array in consideration.


2 Answers

I really like List::MoreUtils and use it frequently. However, I have never liked the natatime function. It doesn't produce output that can be used with a for loop or map or grep.

I like to chain map/grep/apply operations in my code. Once you understand how these functions work, they can be very expressive and very powerful.

But it is easy to make a function to work like natatime that returns a list of array refs.

sub group_by ($@) {
    my $n     = shift;
    my @array = @_;

    croak "group_by count argument must be a non-zero positive integer"
        unless $n > 0 and int($n) == $n;

    my @groups;
    push @groups, [ splice @array, 0, $n ] while @array;

    return @groups;
}

Now you can do things like this:

my @grouped = map [ reverse @$_ ],
              group_by 3, @array;

** Update re Chris Lutz's suggestions **

Chris, I can see merit in your suggested addition of a code ref to the interface. That way a map-like behavior is built in.

# equivalent to my map/group_by above
group_by { [ reverse @_ ] } 3, @array;

This is nice and concise. But to keep the nice {} code ref semantics, we have put the count argument 3 in a hard to see spot.

I think I like things better as I wrote it originally.

A chained map isn't that much more verbose than what we get with the extended API. With the original approach a grep or other similar function can be used without having to reimplement it.

For example, if the code ref is added to the API, then you have to do:

my @result = group_by { $_[0] =~ /foo/ ? [@_] : () } 3, @array;

to get the equivalent of:

my @result = grep $_->[0] =~ /foo/,
             group_by 3, @array;

Since I suggested this for the sake of easy chaining, I like the original better.

Of course, it would be easy to allow either form:

sub _copy_to_ref { [ @_ ] }

sub group_by ($@) {
    my $code = \&_copy_to_ref;
    my $n = shift;

    if( reftype $n eq 'CODE' ) {
        $code = $n;
        $n = shift;
    }

    my @array = @_;

    croak "group_by count argument must be a non-zero positive integer"
        unless $n > 0 and int($n) == $n;

    my @groups;
    push @groups, $code->(splice @array, 0, $n) while @array;

    return @groups;
}

Now either form should work (untested). I'm not sure whether I like the original API, or this one with the built in map capabilities better.

Thoughts anyone?

** Updated again **

Chris is correct to point out that the optional code ref version would force users to do:

group_by sub { foo }, 3, @array;

Which is not so nice, and violates expectations. Since there is no way to have a flexible prototype (that I know of), that puts the kibosh on the extended API, and I'd stick with the original.

On a side note, I started with an anonymous sub in the alternate API, but I changed it to a named sub because I was subtly bothered by how the code looked. No real good reason, just an intuitive reaction. I don't know if it matters either way.

like image 45
daotoad Avatar answered Oct 13 '22 15:10

daotoad


my @VAR;
push @VAR, [ splice @array, 0, 3 ] while @array;

or you could use natatime from List::MoreUtils

use List::MoreUtils qw(natatime);

my @VAR;
{
  my $iter = natatime 3, @array;
  while( my @tmp = $iter->() ){
    push @VAR, \@tmp;
  }
}
like image 194
Brad Gilbert Avatar answered Oct 13 '22 15:10

Brad Gilbert