Is there a way in python with difflib to get offsets of the changes as well as the changes themselves?
What I have is the following:
import difflib
text1 = 'this is a sample text'.split()
text2 = 'this is text two.'.split()
print list(difflib.ndiff(text1, text2))
which prints:
[' this', ' is', '- a', '- sample', ' text', '+ two.']
Can I also get offsets of the corresponding changes? Naive solution would be just to search for changes, but if strings get longer with repeated terms, that wouldn't work.
SequenceMatcher.get_matching_blocks() might help. It returns a list of triples describing matching subsequences. These indices in turn could be used to find the location of differences.
>>> for block in s.get_matching_blocks():
... print "a[%d] and b[%d] match for %d elements" % block
a[0] and b[0] match for 8 elements
a[8] and b[17] match for 21 elements
a[29] and b[38] match for 0 elements
https://docs.python.org/2/library/difflib.html#difflib.SequenceMatcher.get_matching_blocks https://docs.python.org/2/library/difflib.html#sequencematcher-examples
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With