Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the big deal of pick a random element from a big stream?

Tags:

python

Thats the problem:

Given a stream of elements too large to store in memory, pick a random element from the stream with uniform probability.

So why the "oficial" solution is this:

import random
def pickRandom(stream):
    random_element = None
    for i, e in enumerate(stream):
        if i == 0:
            random_element = e
        elif random.randint(1, i + 1) == 1:
            random_element = e
    return random_element

and not this:

import random
random_element = stream[random.randint(0,len(stream))]
like image 646
Marcos Avatar asked Nov 15 '19 15:11

Marcos


1 Answers

To expand upon my comment:

I would guess it's because you can't know len(stream) without exhausting (reading all the items from) it. If you imagine that the stream is a network socket, where someone is sending you a bunch of data items and then closing the socket, you can only read the data from the socket once. You can't make a copy of all the data (because it wont fit in memory (and, in spite of the lack of context, I would take this to also mean it won't fit on disk either). so you effectively have 1 look at each data item and then it's lost.

The 'oficial' (sic) solution is based on a clever mathematical trick. As an aside, this sort of question is the sort of thing I would expect to see in a horrible company's technical/coding interview test, and would make me run a mile.

like image 159
Tom Dalton Avatar answered Oct 23 '22 19:10

Tom Dalton