I have 2 files called "hosts" (in different directories)
I want to compare them using python to see if they are IDENTICAL. If they are not Identical, I want to print the difference on the screen.
So far I have tried this
hosts0 = open(dst1 + "/hosts","r")  hosts1 = open(dst2 + "/hosts","r")  lines1 = hosts0.readlines()  for i,lines2 in enumerate(hosts1):     if lines2 != lines1[i]:         print "line ", i, " in hosts1 is different \n"         print lines2     else:         print "same"   But when I run this, I get
File "./audit.py", line 34, in <module>   if lines2 != lines1[i]: IndexError: list index out of range   Which means one of the hosts has more lines than the other. Is there a better method to compare 2 files and report the difference?
The Python standard library has a module specifically for the purpose of finding diffs between strings/files. To get a diff using the difflib library, you can simply call the united_diff function on it.
Python supports a module called filecmp with a method filecmp. cmp() that returns three list containing matched files, mismatched files and errors regarding those files which could not be compared.
import difflib  lines1 = ''' dog cat bird buffalo gophers hound horse '''.strip().splitlines()  lines2 = ''' cat dog bird buffalo gopher horse mouse '''.strip().splitlines()  # Changes: # swapped positions of cat and dog # changed gophers to gopher # removed hound # added mouse  for line in difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm=''):     print line   Outputs the following:
--- file1 +++ file2 @@ -1,7 +1,7 @@ +cat  dog -cat  bird  buffalo -gophers -hound +gopher  horse +mouse   This diff gives you context -- surrounding lines to help make it clear how the file is different. You can see "cat" here twice, because it was removed from below "dog" and added above it.
You can use n=0 to remove the context.
for line in difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm='', n=0):     print line   Outputting this:
--- file1 +++ file2 @@ -0,0 +1 @@ +cat @@ -2 +2,0 @@ -cat @@ -5,2 +5 @@ -gophers -hound +gopher @@ -7,0 +7 @@ +mouse   But now it's full of the "@@" lines telling you the position in the file that has changed. Let's remove the extra lines to make it more readable.
for line in difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm='', n=0):     for prefix in ('---', '+++', '@@'):         if line.startswith(prefix):             break     else:         print line   Giving us this output:
+cat -cat -gophers -hound +gopher +mouse   Now what do you want it to do? If you ignore all removed lines, then you won't see that "hound" was removed. If you're happy just showing the additions to the file, then you could do this:
diff = difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm='', n=0) lines = list(diff)[2:] added = [line[1:] for line in lines if line[0] == '+'] removed = [line[1:] for line in lines if line[0] == '-']  print 'additions:' for line in added:     print line print print 'additions, ignoring position' for line in added:     if line not in removed:         print line   Outputting:
additions: cat gopher mouse  additions, ignoring position: gopher mouse   You can probably tell by now that there are various ways to "print the differences" of two files, so you will need to be very specific if you want more help.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With