Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare two files report difference in python

I have 2 files called "hosts" (in different directories)

I want to compare them using python to see if they are IDENTICAL. If they are not Identical, I want to print the difference on the screen.

So far I have tried this

hosts0 = open(dst1 + "/hosts","r")  hosts1 = open(dst2 + "/hosts","r")  lines1 = hosts0.readlines()  for i,lines2 in enumerate(hosts1):     if lines2 != lines1[i]:         print "line ", i, " in hosts1 is different \n"         print lines2     else:         print "same" 

But when I run this, I get

File "./audit.py", line 34, in <module>   if lines2 != lines1[i]: IndexError: list index out of range 

Which means one of the hosts has more lines than the other. Is there a better method to compare 2 files and report the difference?

like image 246
Matt Avatar asked Oct 01 '13 15:10

Matt


People also ask

How do you find the difference between two files in Python?

The Python standard library has a module specifically for the purpose of finding diffs between strings/files. To get a diff using the difflib library, you can simply call the united_diff function on it.

Can we compare two files in Python?

Python supports a module called filecmp with a method filecmp. cmp() that returns three list containing matched files, mismatched files and errors regarding those files which could not be compared.


1 Answers

import difflib  lines1 = ''' dog cat bird buffalo gophers hound horse '''.strip().splitlines()  lines2 = ''' cat dog bird buffalo gopher horse mouse '''.strip().splitlines()  # Changes: # swapped positions of cat and dog # changed gophers to gopher # removed hound # added mouse  for line in difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm=''):     print line 

Outputs the following:

--- file1 +++ file2 @@ -1,7 +1,7 @@ +cat  dog -cat  bird  buffalo -gophers -hound +gopher  horse +mouse 

This diff gives you context -- surrounding lines to help make it clear how the file is different. You can see "cat" here twice, because it was removed from below "dog" and added above it.

You can use n=0 to remove the context.

for line in difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm='', n=0):     print line 

Outputting this:

--- file1 +++ file2 @@ -0,0 +1 @@ +cat @@ -2 +2,0 @@ -cat @@ -5,2 +5 @@ -gophers -hound +gopher @@ -7,0 +7 @@ +mouse 

But now it's full of the "@@" lines telling you the position in the file that has changed. Let's remove the extra lines to make it more readable.

for line in difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm='', n=0):     for prefix in ('---', '+++', '@@'):         if line.startswith(prefix):             break     else:         print line 

Giving us this output:

+cat -cat -gophers -hound +gopher +mouse 

Now what do you want it to do? If you ignore all removed lines, then you won't see that "hound" was removed. If you're happy just showing the additions to the file, then you could do this:

diff = difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm='', n=0) lines = list(diff)[2:] added = [line[1:] for line in lines if line[0] == '+'] removed = [line[1:] for line in lines if line[0] == '-']  print 'additions:' for line in added:     print line print print 'additions, ignoring position' for line in added:     if line not in removed:         print line 

Outputting:

additions: cat gopher mouse  additions, ignoring position: gopher mouse 

You can probably tell by now that there are various ways to "print the differences" of two files, so you will need to be very specific if you want more help.

like image 162
rbutcher Avatar answered Oct 05 '22 23:10

rbutcher