When comparing similar lines, I want to highlight the differences on the same line:
a) lorem ipsum dolor sit amet
b) lorem foo ipsum dolor amet
lorem <ins>foo</ins> ipsum dolor <del>sit</del> amet
While difflib.HtmlDiff appears to do this sort of inline highlighting, it produces very verbose markup.
Unfortunately, I have not been able to find another class/method which does not operate on a line-by-line basis.
Am I missing anything? Any pointers would be appreciated!
For your simple example:
import difflib
def show_diff(seqm):
"""Unify operations between two compared strings
seqm is a difflib.SequenceMatcher instance whose a & b are strings"""
output= []
for opcode, a0, a1, b0, b1 in seqm.get_opcodes():
if opcode == 'equal':
output.append(seqm.a[a0:a1])
elif opcode == 'insert':
output.append("<ins>" + seqm.b[b0:b1] + "</ins>")
elif opcode == 'delete':
output.append("<del>" + seqm.a[a0:a1] + "</del>")
elif opcode == 'replace':
raise NotImplementedError("what to do with 'replace' opcode?")
else:
raise RuntimeError("unexpected opcode")
return ''.join(output)
>>> sm= difflib.SequenceMatcher(None, "lorem ipsum dolor sit amet", "lorem foo ipsum dolor amet")
>>> show_diff(sm)
'lorem<ins> foo</ins> ipsum dolor <del>sit </del>amet'
This works with strings. You should decide what to do with "replace" opcodes.
Here's an inline differ inspired by @tzot's answer above (also Python 3 compatible):
def inline_diff(a, b):
import difflib
matcher = difflib.SequenceMatcher(None, a, b)
def process_tag(tag, i1, i2, j1, j2):
if tag == 'replace':
return '{' + matcher.a[i1:i2] + ' -> ' + matcher.b[j1:j2] + '}'
if tag == 'delete':
return '{- ' + matcher.a[i1:i2] + '}'
if tag == 'equal':
return matcher.a[i1:i2]
if tag == 'insert':
return '{+ ' + matcher.b[j1:j2] + '}'
assert False, "Unknown tag %r"%tag
return ''.join(process_tag(*t) for t in matcher.get_opcodes())
It's not perfect, for example, it would be nice to expand 'replace' opcodes to recognize the full word replaced instead of just the few different letters, but it's a good place to start.
Sample output:
>>> a='Lorem ipsum dolor sit amet consectetur adipiscing'
>>> b='Lorem bananas ipsum cabbage sit amet adipiscing'
>>> print(inline_diff(a, b))
Lorem{+ bananas} ipsum {dolor -> cabbage} sit amet{- consectetur} adipiscing
difflib.SequenceMatcher will operate on single lines. You can use the "opcodes" to determine how to change the first line to make it the second line.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With