I have 2 files called "hosts" (in different directories)
I want to compare them using python to see if they are IDENTICAL. If they are not Identical, I want to print the difference on the screen.
So far I have tried this
hosts0 = open(dst1 + "/hosts","r") hosts1 = open(dst2 + "/hosts","r") lines1 = hosts0.readlines() for i,lines2 in enumerate(hosts1): if lines2 != lines1[i]: print "line ", i, " in hosts1 is different \n" print lines2 else: print "same"
But when I run this, I get
File "./audit.py", line 34, in <module> if lines2 != lines1[i]: IndexError: list index out of range
Which means one of the hosts has more lines than the other. Is there a better method to compare 2 files and report the difference?
The Python standard library has a module specifically for the purpose of finding diffs between strings/files. To get a diff using the difflib library, you can simply call the united_diff function on it.
Python supports a module called filecmp with a method filecmp. cmp() that returns three list containing matched files, mismatched files and errors regarding those files which could not be compared.
import difflib lines1 = ''' dog cat bird buffalo gophers hound horse '''.strip().splitlines() lines2 = ''' cat dog bird buffalo gopher horse mouse '''.strip().splitlines() # Changes: # swapped positions of cat and dog # changed gophers to gopher # removed hound # added mouse for line in difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm=''): print line
Outputs the following:
--- file1 +++ file2 @@ -1,7 +1,7 @@ +cat dog -cat bird buffalo -gophers -hound +gopher horse +mouse
This diff gives you context -- surrounding lines to help make it clear how the file is different. You can see "cat" here twice, because it was removed from below "dog" and added above it.
You can use n=0 to remove the context.
for line in difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm='', n=0): print line
Outputting this:
--- file1 +++ file2 @@ -0,0 +1 @@ +cat @@ -2 +2,0 @@ -cat @@ -5,2 +5 @@ -gophers -hound +gopher @@ -7,0 +7 @@ +mouse
But now it's full of the "@@" lines telling you the position in the file that has changed. Let's remove the extra lines to make it more readable.
for line in difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm='', n=0): for prefix in ('---', '+++', '@@'): if line.startswith(prefix): break else: print line
Giving us this output:
+cat -cat -gophers -hound +gopher +mouse
Now what do you want it to do? If you ignore all removed lines, then you won't see that "hound" was removed. If you're happy just showing the additions to the file, then you could do this:
diff = difflib.unified_diff(lines1, lines2, fromfile='file1', tofile='file2', lineterm='', n=0) lines = list(diff)[2:] added = [line[1:] for line in lines if line[0] == '+'] removed = [line[1:] for line in lines if line[0] == '-'] print 'additions:' for line in added: print line print print 'additions, ignoring position' for line in added: if line not in removed: print line
Outputting:
additions: cat gopher mouse additions, ignoring position: gopher mouse
You can probably tell by now that there are various ways to "print the differences" of two files, so you will need to be very specific if you want more help.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With