I have two large (~100 GB) text files that must be iterated through simultaneously.
Zip works well for smaller files but I found out that it's actually making a list of lines from my two files. This means that every line gets stored in memory. I don't need to do anything with the lines more than once.
handle1 = open('filea', 'r'); handle2 = open('fileb', 'r')
for i, j in zip(handle1, handle2):
do something with i and j.
write to an output file.
no need to do anything with i and j after this.
Is there an alternative to zip() that acts as a generator that will allow me to iterate through these two files without using >200GB of ram?
When using write in both, there's no difference whatsoever. No, it's not faster. Only write seems to be faster than print . Your solution seemed to be about map instead of zip , not write instead of print .
Use the izip() Function to Iterate Over Two Lists in Python It iterates over the lists until the smallest of them gets exhausted. It then zips or maps the elements of both lists together and returns an iterator object. It returns the elements of both lists mapped together according to their index.
Python's zip() function creates an iterator that will aggregate elements from two or more iterables. You can use the resulting iterator to quickly and consistently solve common programming problems, like creating dictionaries.
You can use izip_longest like this to pad the shorter file with empty lines
in python 2.6
from itertools import izip_longest
with handle1 as open('filea', 'r'):
with handle2 as open('fileb', 'r'):
for i, j in izip_longest(handle1, handle2, fillvalue=""):
...
or in Python 3+
from itertools import zip_longest
with handle1 as open('filea', 'r'), handle2 as open('fileb', 'r'):
for i, j in zip_longest(handle1, handle2, fillvalue=""):
...
itertools
has a function izip
that does that
from itertools import izip
for i, j in izip(handle1, handle2):
...
If the files are of different sizes you may use izip_longest
, as izip
will stop at the smaller file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With