Thats the problem:
Given a stream of elements too large to store in memory, pick a random element from the stream with uniform probability.
So why the "oficial" solution is this:
import random
def pickRandom(stream):
random_element = None
for i, e in enumerate(stream):
if i == 0:
random_element = e
elif random.randint(1, i + 1) == 1:
random_element = e
return random_element
and not this:
import random
random_element = stream[random.randint(0,len(stream))]
To expand upon my comment:
I would guess it's because you can't know len(stream) without exhausting (reading all the items from) it. If you imagine that the stream is a network socket, where someone is sending you a bunch of data items and then closing the socket, you can only read the data from the socket once. You can't make a copy of all the data (because it wont fit in memory (and, in spite of the lack of context, I would take this to also mean it won't fit on disk either). so you effectively have 1 look at each data item and then it's lost.
The 'oficial' (sic) solution is based on a clever mathematical trick. As an aside, this sort of question is the sort of thing I would expect to see in a horrible company's technical/coding interview test, and would make me run a mile.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With