Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iterate over the lines of two files simultaneously

I am trying to concatenate pieces specific lines together between two files. Such that I want to add something from line 2 in file2 onto line 2 of file1. Then from line 6 from file2 onto line 6 of file 1 and so on. Is there a way to simultaneously iterate through these two files to do this? (It might be helpful to know that the input files are about 15GB each).

Here is a simplified example:

File 1:

Ignore
This is a
Ignore
Ignore
Ignore
This is also a
Ignore
Ignore

File 2:

Ignore
sentence
Ignore
Ignore
Ignore
sentence
Ignore
Ignore

Output file:

Ignore
This is a sentence
Ignore
Ignore
Ignore
This is also a sentence
Ignore
Ignore
like image 601
The Nightman Avatar asked Dec 03 '22 16:12

The Nightman


1 Answers

Python3:

with open('bigfile_1') as bf1:
    with open('bigfile_2') as bf2:
        for line1, line2 in zip(bf1, bf2):
            process(line1, line2)

Importantly, bf1 and bf2 will not read the entire file in at once. They are iterators which know how to produce one line at a time.

zip() works fine with iterators and will produce an interator itself, in this case pairs of lines for you to process.

Using with ensures the files will be closed afterwards.

Python 2.x

import itertools

with open('bigfile_1') as bf1:
    with open('bigfile_2') as bf2:
        for line1, line2 in itertools.izip(bf1, bf2):
            process(line1, line2)

Python 2.x can't use zip the same way - it'll produce a list instead of an iterable, eating all of your system memory with those 15GB files. We need to use a special iterable version of zip.

like image 111
Kenan Banks Avatar answered Dec 05 '22 07:12

Kenan Banks