I understand similar questions have been asked like
Choose m elements randomly from a vector containing n elements
Pick N items at random from sequence of unknown length
But the more I look the more I got confused.
Uniformly and randomly choose M elements from N elements
So I need to pick M elements from N ones. And also I need to make the probability of being selected to be uniformly distributed for every element: M/N
My intuition was
I guess this solution is wrong? The probabilities for the M
selected elements are 1/N
, 1/(N-1)
, ..., 1/(N-M)
, not M/N
, am I correct?
A possible correct solution is that
But I can't figure out the probability of being selected for each element.
Can anyone demonstrate the probability of this solution is M/N
indeed?
Also of course we can use Reservoir sampling, and its probability is M / N
.
I think your confusion there may be that you're not discriminating between choosing a sequence, from choosing a set.
In your first procedure, just because you only have a 1/N
chance of choosing a particular element in the first round, doesn't mean you won't choose it in a subsequent round. The element has a 1/N
chance of being the first element in the result... but it has a M/N
chance of being chosen during some round. So that works. Take M=2, N=4: The chance of an element being picked is 1/4 + (3/4)*(1/3) = 2/4
.
As for your second procedure, following the shuffle, each element's position within the array is uniformly distributed, so there's a M/N
chance that its position is equal to or less than M (and is hence chosen). So that works too.
The probabilities for elements being picked:
First: 1/N;
Second: 1/(N-1) * (1 - 1/N) = 1/N
...
where (1 - 1/N)
is the probability of the second item not being picked at the first step, which is the condition for it to be picked at the second step. Here 1/N
is the probability of some element being selected at the first step, and (1 - 1/N)
- probability of element selected at the second step being available for selection at the second step.
Third: 1/(N-2) * (1 - 1/N - 1/N) = 1/N
where (1 - 1/N - 1/N)
is the probability of an element being available for selection at the third step.
And so on.
The point here is that for any element the probability of selection at a step is 1/N
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With