I have a very large text file has more than 30 GB size. For some reasons, I want to read lines between 1000000 and 2000000 and compare with user input string. If it matches, I need to write the line content into to another file.
I know how to read a file line by line.
input_file = open('file.txt', 'r')
for line in input_file:
print line
But if the size of file is large, it really affect performance right? How to address this in an optimized way.
You can use itertools.islice
:
from itertools import islice
with open('file.txt') as fin:
lines = islice(fin, 1000000, 2000000) # or whatever ranges
for line in lines:
# do something
Of course, if your lines are fixed length, you can use that to directly fin.seek()
to the start of the line. Otherwise, the approach above still has to read n
lines until islice
starts producing output, but is just really a convenient way to limit the range.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With