I'm playing around with difflib in Python and I'm having some difficulty getting the output to look good. For some strange reason, difflib is adding a single whitespace before each character. For example, I have a file (textfile01.txt) that looks like this:
test text which has no meaning
and textfile02.txt
test text which has no meaning
but looks nice
Here's a small code sample for how I'm trying to accomplish the comparison:
import difflib
handle01 = open(text01.txt , 'r')
handle02 = open(text02.txt , 'r')
d = difflib.ndiff( handle01.read() , handle02.read() )
print "".join(list(diff))
Then, I get this ugly output that looks...very strange:
t e s t t e x t w h i c h h a s n o m e a n i n g-
- b- u- t- - l- o- o- k- s- - n- i- c- e
As you can see, the output looks horrible. I've just been following basic difflib tutorials I found online, and according to those, the output should look completely different. I have no clue what I'm doing wrong. Any ideas?
difflib.ndiff compares lists of strings, but you are passing strings to them — and a string is really a list of characters. The function is thus comparing the strings character by character.
>>> list(difflib.ndiff("test", "testa"))
['  t', '  e', '  s', '  t', '+ a']
(Literally, you can go from the list ["t", "e", "s", "t"] to the list ["t", "e", "s", "t", "a"] by adding the element ["a"] there.
You want to change read() to readlines() so you can compare the two files in a linewise fashion, which is probably what you were expecting.
You also want to change "".join(... to "\n".join(... in order to get a diff-like output on screen.
>>> list(difflib.ndiff(["test"], ["testa"]))
['- test', '+ testa', '?     +\n']
>>> print "\n".join(_)
- test
+ testa
?     +
(Here difflib is being extra nice and marking the exact position where the character was added in the ? line.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With