I am trying to concatenate pieces specific lines together between two files. Such that I want to add something from line 2 in file2 onto line 2 of file1. Then from line 6 from file2 onto line 6 of file 1 and so on. Is there a way to simultaneously iterate through these two files to do this? (It might be helpful to know that the input files are about 15GB each).
Here is a simplified example:
File 1:
Ignore
This is a
Ignore
Ignore
Ignore
This is also a
Ignore
Ignore
File 2:
Ignore
sentence
Ignore
Ignore
Ignore
sentence
Ignore
Ignore
Output file:
Ignore
This is a sentence
Ignore
Ignore
Ignore
This is also a sentence
Ignore
Ignore
Python3:
with open('bigfile_1') as bf1:
with open('bigfile_2') as bf2:
for line1, line2 in zip(bf1, bf2):
process(line1, line2)
Importantly, bf1 and bf2 will not read the entire file in at once. They are iterators which know how to produce one line at a time.
zip()
works fine with iterators and will produce an interator itself, in this case pairs of lines for you to process.
Using with
ensures the files will be closed afterwards.
Python 2.x
import itertools
with open('bigfile_1') as bf1:
with open('bigfile_2') as bf2:
for line1, line2 in itertools.izip(bf1, bf2):
process(line1, line2)
Python 2.x can't use zip the same way - it'll produce a list instead of an iterable, eating all of your system memory with those 15GB files. We need to use a special iterable version of zip.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With