How would I get a random subset of a set s
in python? I tried doing
from random import sample, randint
def random_subset(s):
length = randint(0, len(s))
return set(sample(s, length))
But I now realize that this obviously doesn't work since the distribution of the len(s)
where s
is a random subset is not uniform from 0
to n
.
I'm sure I could compute that distribution and use numpy's sample with probability, or something like that, but I'd like something preferably with pure python.
In Python, you can randomly sample elements from a list with choice() , sample() , and choices() of the random module. These functions can also be applied to a string and tuple. choice() returns one random element, and sample() and choices() return a list of multiple random elements.
You can use random. randint() and random. randrange() to generate the random numbers, but it can repeat the numbers. To create a list of unique random numbers, we need to use the sample() method.
Python | random.sample() function. sample() is an inbuilt function of random module in Python that returns a particular length list of items chosen from the sequence i.e. list, tuple, string or set. Used for random sampling without replacement.
To generate a random sample, numpy.random.choice permutes the array each time we call it. When our sample size is only a fraction of the whole array length, we do not need to shuffle the array each time we want to take a sample. Let’s just shuffle it once and take samples from the start of the shuffled array.
Python has itertools.combinations (iterable, n) which Return n length subsequences of elements from the input iterable. This can be used to Print all subsets of given size of a set. Now, we have various alternatives to use this function.
Last Updated : 29 Aug, 2018 sample () is an inbuilt function of random module in Python that returns a particular length list of items chosen from the sequence i.e. list, tuple, string or set. Used for random sampling without replacement. Syntax : random.sample (sequence, k)
I just realized I can simply go through each element in s
and decide independently to keep it or not. Something like this
from random import randint
def random_subset(s):
out = set()
for el in s:
# random coin flip
if randint(0, 1) == 0:
out.add(el)
return out
This has the correct distribution.
What subset you obtain will depend largely on the criterion you specify for including or excluding elements. If you have a function criterion
that accepts an element and returns a Boolean to indicate inclusion in the subset, the actual creation process becomes simply
from random import randrange
def random_subset(s, criterion=lambda x: randrange(2)):
return set(filter(criterion, s))
filter
creates a lazy generator, so the return subset is the only place the selection gets stored. The default criterion is very simple and has a uniform distribution. randrange
is similar to randint
except that it is exclusive in the right bound. At least as of Python 3.2+, both functions produce fairly uniform results regardless of range size.
You can further refine the criterion by using random
:
from random import random
criterion = lambda x: random() < 0.5
Applying a threshold like that may seem like overkill, but it lets you adjust the distribution. You can have a function that generates criteria for whatever threshold you like:
def make_criterion(threshold=0.5):
return lambda x: random() < threshold
You could use it to get a smaller subset:
random_subset(s, make_criterion(0.1))
In fact, you can make the criterion as complicated as you would like. The following example is a contrived callable class that operates on sets of strings. If a string with a matching first character has already been added, it automatically rejects the current element. If the second letter has been seen already, it sets the probability of inclusion to 0.25. Otherwise, it flips a coin:
class WeirdCriterion:
def __init__(self):
self.first = set()
self.second = set()
def __call__(self, x):
n = len(x)
if n > 0:
if x[0] in self.first:
return False
self.first.add(x[0])
if n > 1:
if x[1] in self.second:
return not randrange(4)
self.second.add(x[1])
return randrange(2)
This example wouldn't be very good in practice because sets are unordered, and can give different iteration orders between different runs of the same script. What it shows, however, is a method for creating a criterion that is random, but is adjusted in response to elements that are already in the subset.
Avoiding Numpy
Now that I have a better understanding of your original intent, you can leverage the fact that Python 3 has infinite length integers and that choices
accepts a length parameter to get the correct length. I don't recommend this approach though:
from random import choices, sample
from math import factorial
def random_subset(s):
n = len(s)
nf = factorial(n)
# yes, there are better ways of doing this, even in pure python
weights = [nf / (factorial(k) * factorial(n - k)) for k in range(n + 1)]
length = choices(range(n + 1), weights, k=1)[0]
return sample(s, length)
A better solution for computing the binomial coefficients could be something like:
def pascal(n):
result = [1] * (n + 1)
if n < 2:
return result
for i in range(2, n + 1):
for j in range(i - 1, 0, -1):
result[j] += result[j - 1]
return result
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With