Is there a built-in method to do it? If not how can I do this without costing too much overhead?
readline() function. The readline() is a built-in function that returns one line from the file. Open a file using open(filename, mode) as a file with mode “r” and call readline() function on that file object to get the first line of the file.
Not built-in, but algorithm R(3.4.2)
(Waterman's "Reservoir Algorithm") from Knuth's "The Art of Computer Programming" is good (in a very simplified version):
import random def random_line(afile): line = next(afile) for num, aline in enumerate(afile, 2): if random.randrange(num): continue line = aline return line
The num, ... in enumerate(..., 2)
iterator produces the sequence 2, 3, 4... The randrange
will therefore be 0 with a probability of 1.0/num
-- and that's the probability with which we must replace the currently selected line (the special-case of sample size 1 of the referenced algorithm -- see Knuth's book for proof of correctness == and of course we're also in the case of a small-enough "reservoir" to fit in memory ;-))... and exactly the probability with which we do so.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With