Got this question from algorithms design manual by Steven Skiena.
It is required to select k (value given) numbers to form a subset S' from a given set S having n numbers, such that selection probability for each number is equal (k/n). n is unknown (i was thinking of taking S as a link-list for this). also, we can have only pass through the set S.
When n is unknown you'd rather need an on-line algorithm for so-called Reservoir sampling.
The good explanation & proof sketches are provided here http://propersubset.com/2010/04/choosing-random-elements.html
I mean this algorithm implemented in Python (taken from the link above)
import random
def random_subset( iterator, K ):
result = []
N = 0
for item in iterator:
N += 1
if len( result ) < K:
result.append( item )
else:
s = int(random.random() * N)
if s < K:
result[ s ] = item
return result
Something like this
for elem in S
if random() < (k - S'.size)/S.size // This is float division
S'.add(elem)
The first element is chosen with probability k/n
, the second one with (n-k)/n * k/(n-1) + k/n * (k-1)/(n-1)
which reduces to k/n
, etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With