diff two big files in Python

Question

I have two big text files, near 2GB each. I need something like diff f1.txt f2.txt . Is there any way to do this task fast in python? Standard difflib is too slow. I assume there is faster way, because difflib is fully implemented in Python.

Senthil Kumaran · Accepted Answer

How about using difflib in way that you script can handle big files? Don't load the files in memory, but iterate through the files of the files and diff in chunks. For e.g 100 lines at a time.

import difflib

d = difflib.Differ()

f1 = open('bigfile1')
f2 = open('bigfile2')

b1 = []
b2 = []

for n, lines in enumerate(zip(f1,f2)):
    if not (n % 100 == 0):
        b1.append(lines[0])
        b2.append(lines[1])
    else:
        diff = d.compare("".join(b1), "".join(b2))
        b1 = []
        b2 = []
        print ''.join(list(diff))

diff = d.compare("".join(b1), "".join(b2))
print ''.join(list(diff))
f1.close()
f2.close()

diff two big files in Python

Tags:

python

diff

Mykola Kharechko

1 Answers

Senthil Kumaran

Recent Activity

Donate For Us

diff two big files in Python

Tags:

python

diff

Mykola Kharechko

1 Answers

Senthil Kumaran

Related questions

Recent Activity

Donate For Us