Using Python, I'd like to output the difference between two strings as a unified diff (-u) while, optionally, ignoring blank lines (-B) and spaces (-w).
Since the strings were generated internally, I'd prefer to not deal with nuanced complexity of writing one or both strings to a file, running GNU diff, fixing up the output, and finally cleaning up.
While difflib.unified_diff generates unified diffs it doesn't seem to let me tweak how spaces and blank lines are handled. I've looked at its implementation and, I suspect, the only solution is to copy/hack that function's body.
Is there anything better?
For the moment I'm stripping the pad characters using something like:
import difflib
import re
import sys
l = "line 1\nline 2\nline 3\n"
r = "\nline 1\n\nline 2\nline3\n"
strip_spaces = True
strip_blank_lines = True
if strip_spaces:
l = re.sub(r"[ \t]+", r"", l)
r = re.sub(r"[ \t]+", r"", r)
if strip_blank_lines:
l = re.sub(r"^\n", r"", re.sub(r"\n+", r"\n", l))
r = re.sub(r"^\n", r"", re.sub(r"\n+", r"\n", r))
# run diff
diff = difflib.unified_diff(l.splitlines(keepends=True), r.splitlines(keepends=True))
sys.stdout.writelines(list(diff))
which, of course, results in the output for a diff of something something other than the original input. For instance, pass the above text to GNU diff 3.3 run as "diff -u -w" and "line 3" is displayed as part of the context, the above would display "line3".
Make Your own SequenceMatcher
, copy unified_diff
body and replace SequenceMatcher
with Your own matcher.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With