Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Given a set of N elements, choose k at random, m times

Given a set of N elements, I want to choose m random, non-repeating subsets of k elements.

If I was looking to generate all the N choose k combinations, I could have used itertools.combination, so one way to do what I m asking would be:

import numpy as np
import itertools
n=10
A = np.arange(n)
k=4
m=5
result = np.random.permutation([x for x in itertools.permutations(A,k)])[:m]
print(result)

The problem is of course that this code first generates all the possible permutations, and that this can be quite expensive.

Another suboptimal solution would be to choose each time a single permutation at random (e.g. choose-at-random-from-combinations, then sort to get permutation), and discard it if it has already been selected.

Is there a better way to do this?

like image 737
ntg Avatar asked Oct 16 '22 23:10

ntg


2 Answers

Your second solution seems to be the only practical way to do it. It will work well unless k is close to n and m is "large", in which case there will be more repetitions.

I added a count of the tries needed to get the samples we need. For m=50, with n=10 and k=4, it takes usually less than 60 tries. You can see how it goes with the size of your population and your samples.

You can use random.sample to get a list of k values without replacement, then sort it and turn it into a tuple. So, we can use a set for keeping only unique results.

import random

n = 10
A = list(range(n))
k = 4
m = 5

samples = set()
tries = 0
while len(samples) < m:
    samples.add(tuple(sorted(random.sample(A, k))))
    tries += 1

print(samples)
print(tries)

# {(1, 4, 5, 9), (0, 3, 6, 8), (0, 4, 7, 8), (3, 5, 7, 9), (1, 2, 3, 4)}
# 6
# 6 tries this time !
like image 153
Thierry Lathuille Avatar answered Oct 21 '22 00:10

Thierry Lathuille


The simplest way to do it is to random.shuffle(range) then take first k elements (need to be repeated until m valid samples are collected).

Of course this procedure cannot guarantee unique samples. You are to check a new sample against your historical hash if you really need it.

Since Pyton2.3, random.sample(range, k) can be used to produce a sample in a more efficient way

like image 44
AndreyS Scherbakov Avatar answered Oct 21 '22 00:10

AndreyS Scherbakov