Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I return a random line from a file? Interview Question [closed]

Tags:

c++

c

I am preparing for a phone interview. I came upon these questions on the internet. Can anyone tell me some good answers for these?

  1. Suppose I give you a text file and ask you a to write a program that will return a random line from the file (all lines must have equal probability to be returned)

  2. Same as part 1, except this time the entire text file cannot fit into main memory

  3. Same as part 2, except now you have a stream instead of a file.

Please help.

Ok...@Everyone, I really had a few ideas in my mintod before asking this...Seeing the relentless attack by my fellow SOers, I am posting my answers. Please feel free to attack them too...

1: Count the number of '\n' in the file. Generate a random number between 1 and the number and return the line after the number-1 '\n'.

2: Bring the file into main memory part by part and follow the above procedure.

3: I dont have much idea about this and would appreciate any inputs.

Its wonderful that you guys really give an inspiration to push forward.....

like image 475
Race Avatar asked Jan 06 '10 21:01

Race


2 Answers

  1. Read all lines into an array, return a random line in the range of 1 and the amount of lines.

  2. Most simple: Count the lines, choose a line number at random, go through the file a second time and return it.

  3. You just have to remember one line. Each new line has a probability of 1/N (N being lines read).

    Pseudocode:

    i = 1
    chosen_line = ""
    for line in lines:
        if random() < 1/i:    # random returns a uniform random number in [0,1)
            chosen_line = line
        i += 1
    return chosen_line
    

Algorithm number 3 could be used for 1 and 2 too.

like image 131
Georg Schölly Avatar answered Oct 02 '22 22:10

Georg Schölly


You can do this without having to read all the lines in memory, thus working well for huge files. Pseudocode:

linenum := 0
ret := ''
while more lines to read:
    line := readline()
    linenum := linenum + 1
    r := uniform_random(0, linenum)
    if r < 1:
        ret := line

return ret

Proof: We begin by noting that we always save the first line in ret. If the file has one line, you are going to choose it, and you're done.

For two-line file, ret will save the first line 100% of the times, and the second line will be saved in ret 50% of the time during the second iteration of the loop. Thus, each line has a probability of 0.5 of being selected.

Now, let's assume that this method works for files of ≤N lines. To prove that this means it works for N+1, in the (N+1)th iteration of the loop, there is a probability of 1/(N+1) that the last line will be selected (random(0, N+1) < 1 has that probability). Thus, the last line has 1/(N+1) probability of being selected. The probability of all other lines being selected is still going to be equal to each other, let's call this x. Then, N*x + 1/(N+1) == 1, which means that x = 1/(N+1).

Proof by induction is complete.

Edit: Oops, didn't see the first answer's third method before replying. Still, I will keep this post here if only for the proof, and the opportunity for other people to correct it if there are any errors in it.

like image 24
Alok Singhal Avatar answered Oct 02 '22 22:10

Alok Singhal