For concreteness, how would you read text line, and select and print one random line, when you don't know the number of lines in advance?
Yes this is a problem from the programming pearl which I get confused.
The solution choose the 1st element, then select the second with probability 1/2, the third with 1/3, and so forth.
An algorithm:
i = 0
while more input lines
with probability 1.0/++i
choice = this input line
print choice
Suppose the final choice is the 3rd element, the probability is 1 x 1/2 x 1/3 x 3/4 x ... x n-2/n-1 x n-1/n == 1/2n ? But 1/n should be correct.
Your algorithm is correct, but the analysis is not. You can prove it by induction. Loosely: It works for N = 1 of course. If it works up to N-1, then what happens at N? The chance that the Nth element is chosen and overwrites the last choice is 1/N -- good. The chance that it isn't is (N-1)/N. In which case the choice from the previous step is used. But at that point all elements had an 1/(N-1) chance of being chosen. Now it's 1/N. Good.
Your calculation is wrong :
Suppose the final choice is the 3rd element, the probability is 1 x 1/2 x 1/3 x 3/4 x ... x n-2/n-1 x n-1/n
The real probability is :
(1 x 1/2 + 1 x 1/2) x 1/3 x 3/4 x ... x n-2/n-1 x n-1/n == 1/n
since either you chose 2 or you don't chose 2 (chosing 2 has a proba of 1/2 and not chosing 2 a proba of 1/2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With