I'm trying to analyse two ±6 gb files. I need to analyse them simultaneously, because I need two lines at the same time (one from each file). I tried to do something like this:
with open(fileOne, "r") as First_file:
for index, line in enumerate(First_file):
# Do some stuff here
with open(fileTwo, "r") as Second_file:
for index, line in enumerate(Second_file):
# Do stuff here aswell
The problem is that in the second "with open" loop starts at the beginning of the file. So the time is takes to do the analysis will take way to long. I also tried this:
with open(fileOne, "r") as f1, open(fileTwo, "r") as f2:
for index, (line_R1, line_R2) in enumerate(zip(f1, f2)):
The problem is that both files are loaded directly into the memory. I need the same line from each file. The correct line is:
number_line%4 == 1
This will give line 2, 5, 9, 13 ect. I need those lines from both files.
Is there a faster way and more memory-efficient way to do this?
In Python 2, use itertools.izip()
to prevent the files being loaded into memory:
from itertools import izip
with open(fileOne, "r") as f1, open(fileTwo, "r") as f2:
for index, (line_R1, line_R2) in enumerate(izip(f1, f2)):
The built-in zip()
function will indeed read both file objects into memory in their entirety, izip()
retrieves lines one at a time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With