Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I take n elements at random from a Perl array?

Tags:

perl

I have an array, A = [a1,a2,a3,...aP] with size P. I have to sample q elements from array A.

I plan to use a loop with q iterations, and randomly pick a element from A at each iteration. But how can I make sure that the picked number will be different at each iteration?

like image 247
bit-question Avatar asked Jan 22 '12 17:01

bit-question


2 Answers

The other answers all involve shuffling the array, which is O(n). It means modifying the original array (destructive) or copying the original array (memory intensive).

The first way to make it more memory efficient is not to shuffle the original array but to shuffle an array of indexes.

# Shuffled list of indexes into @deck
my @shuffled_indexes = shuffle(0..$#deck);

# Get just N of them.
my @pick_indexes = @shuffled_indexes[ 0 .. $num_picks - 1 ];  

# Pick cards from @deck
my @picks = @deck[ @pick_indexes ];

It is at least independent of the content of the @deck, but its still O(nlogn) performance and O(n) memory.

A more efficient algorithm (not necessarily faster, depends on now big your array is) is to look at each element of the array and decide if it's going to make it into the array. This is similar to how you select a random line from a file without reading the whole file into memory, each line has a 1/N chance of being picked where N is the line number. So the first line has a 1/1 chance (it's always picked). The next has a 1/2. Then 1/3 and so on. Each pick will overwrite the previous pick. This results in each line having a 1/total_lines chance.

You can work it out for yourself. A one line file has a 1/1 chance so the first one is always picked. A two line file... the first line has a 1/1 then a 1/2 chance of surviving, which is 1/2, and the second line has a 1/2 chance. For a three line file... the first line has a 1/1 chance of being picked, then a 1/2 * 2/3 chance of surviving which is 2/6 or 1/3. And so on.

The algorithm is O(n) for speed, it iterates through an unordered array once, and does not consume any more memory than is needed to store the picks.

With a little modification, this works for multiple picks. Instead of a 1/$position chance, it's $picks_left / $position. Each time a pick is successful, you decrement $picks_left. You work from the high position to the low one. Unlike before, you don't overwrite.

my $picks_left = $picks;
my $num_left = @$deck;
my @picks;
my $idx = 0;
while($picks_left > 0 ) {  # when we have all our picks, stop
    # random number from 0..$num_left-1
    my $rand = int(rand($num_left));

    # pick successful
    if( $rand < $picks_left ) {
        push @result, $deck->[$idx];
        $picks_left--;
    }

    $num_left--;
    $idx++;
}

This is how perl5i implements its pick method (coming next release).

To understand viscerally why this works, take the example of picking 2 from a 4 element list. Each should have a 1/2 chance of being picked.

1. (2 picks, 4 items):         2/4 = 1/2

Simple enough. Next element has a 1/2 chance that an element will already have been picked, in which case it's chances are 1/3. Otherwise its chances are 2/3. Doing the math...

2. (1 or 2 picks,  3 items):   (1/3 * 1/2) + (2/3 * 1/2) = 3/6 = 1/2

Next has a 1/4 chance that both elements will already be picked (1/2 * 1/2), then it has no chance; 1/2 chance that only one will be picked, then it has 1/2; and the remaining 1/4 that no items will be picked in which case it's 2/2.

3. (0, 1 or 2 picks, 2 items): (0/2 * 1/4) + (1/2 * 2/4) + (2/2 * 1/4) = 2/8 + 1/4 = 1/2

Finally, for the last item, there's a 1/2 the previous took the last pick.

4. (0 or 1 pick, 1 items):     (0/1 * 2/4) + (1/1 * 2/4) = 1/2

Not exactly a proof, but good for convincing yourself it works.

like image 120
3 revs, 2 users 98% Avatar answered Oct 24 '22 02:10

3 revs, 2 users 98%


From perldoc perlfaq4:

How do I shuffle an array randomly?

If you either have Perl 5.8.0 or later installed, or if you have Scalar-List-Utils 1.03 or later installed, you can say:

use List::Util 'shuffle';
@shuffled = shuffle(@list);

If not, you can use a Fisher-Yates shuffle.

sub fisher_yates_shuffle {

    my $deck = shift;  # $deck is a reference to an array
    return unless @$deck; # must not be empty!

    my $i = @$deck;
    while (--$i) {
        my $j = int rand ($i+1);
        @$deck[$i,$j] = @$deck[$j,$i];
    }
}


# shuffle my mpeg collection
# 

my @mpeg = <audio/*/*.mp3>;
fisher_yates_shuffle( \@mpeg );    # randomize @mpeg in place
print @mpeg;

You could also use List::Gen:

my $gen = <1..10>;
print "$_\n" for $gen->pick(5);  # prints five random numbers
like image 22
Zaid Avatar answered Oct 24 '22 04:10

Zaid