On average, how many clicks would be required to see all the questions in the database?
What is the approach required for such questions?
See coupon collector's problem.
This is a mathematical question rather than an algorithmic one. As sdcvvc said, this is the famous coupon collector's problem.
Suppose you have n questions to "gather". Let X be a random variable denoting the required number of clicks. If we define Xi to be the number of clicks from the moment we have (i-1) questions to the moment we have i questions, then:
X = X1 + X2 + ... + Xn
Due to the linearity of the expected value:
E(X) = E(X1 + X2 + ... + Xn) = EX1 + EX2 + ... + EXn
If we inspect Xi, we see that in fact it has geometric distribution with p=(n-i+1)/n, hence a mean value of n/(n-i+1). Therefore:
EX = n * (1/n + 1/(n-1) + ... + 1/2 + 1/1) = n * Hn
Where Hn is the nth Harmonic number, which can be approximated by ln n:
EX ~= n * ln n
You can run a simple simulation and test this approximation.
Why don't we run a simulation and found out?
<?php
function simulate($size) {
$range = range(1, $size);
$hits = 0;
$hit = array();
while(count($hit) != $size) {
$rand = array_rand($range);
$hit[$rand] = 1;
$hits++;
}
return $hits;
}
for ($j=10; $j<101; $j+=10) {
$res = array();
for ($i=0; $i<10; $i++) {
$res[] = simulate($j);
}
echo "for size=$j, avg=" . array_sum($res)/10 . "<br />";
}
Output:
for size=10, avg=35.9
for size=20, avg=68.8
for size=30, avg=123.3
for size=40, avg=176.9
for size=50, avg=205.9
for size=60, avg=276.8
for size=70, avg=304.9
for size=80, avg=401.9
for size=90, avg=371
for size=100, avg=617.7
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With