python analysing two large files simultaneously line by line

Question

I'm trying to analyse two ±6 gb files. I need to analyse them simultaneously, because I need two lines at the same time (one from each file). I tried to do something like this:

with open(fileOne, "r") as First_file:
    for index, line in enumerate(First_file):

        # Do some stuff here

    with open(fileTwo, "r") as Second_file:
        for index, line in enumerate(Second_file):

            # Do stuff here aswell

The problem is that in the second "with open" loop starts at the beginning of the file. So the time is takes to do the analysis will take way to long. I also tried this:

with open(fileOne, "r") as f1, open(fileTwo, "r") as f2:
    for index, (line_R1, line_R2) in enumerate(zip(f1, f2)):

The problem is that both files are loaded directly into the memory. I need the same line from each file. The correct line is:

number_line%4 == 1

This will give line 2, 5, 9, 13 ect. I need those lines from both files.

Is there a faster way and more memory-efficient way to do this?

Martijn Pieters · Accepted Answer

In Python 2, use itertools.izip() to prevent the files being loaded into memory:

from itertools import izip

with open(fileOne, "r") as f1, open(fileTwo, "r") as f2:
    for index, (line_R1, line_R2) in enumerate(izip(f1, f2)):

The built-in zip() function will indeed read both file objects into memory in their entirety, izip() retrieves lines one at a time.

python analysing two large files simultaneously line by line

Tags:

python

TheBumpper

1 Answers

Martijn Pieters

Recent Activity

Donate For Us

python analysing two large files simultaneously line by line

Tags:

python

TheBumpper

1 Answers

Martijn Pieters

Related questions

Recent Activity

Donate For Us