Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Difflib - How to Get SDiff Sequences with "Change" Op

I am reading the documentation for Python's difllib. According to the docs each, Differ delta gives a sequence

Code    Meaning
'- '    line unique to sequence 1
'+ '    line unique to sequence 2
'  '    line common to both sequences
'? '    line not present in either input sequence 

But what about the "Change" operation? How do I get a "c " instruction similar to the results in Perl's sdiff?

like image 540
David Williams Avatar asked Nov 02 '25 14:11

David Williams


2 Answers

Look this script.

sdiff.py @ hungrysnake.net

http://hungrysnake.net/doc/software__sdiff_py.html

Perl's sdiff(Algorithm::Diff) dont think about "Matching rate", but python's sdiff.py think about it. =)

I have 2 text files.

$ cat text1.txt
aaaaaa
bbbbbb
cccccc
dddddd
eeeeee
ffffff

$ cat text2.txt
aaaaaa
bbbbbb
xxxxxxx
ccccccy
zzzzzzz
eeeeee
ffffff

I got side by side diff by sdiff command or Perl's sdiff(Algorithm::Diff).

$ sdiff text1.txt text2.txt
aaaaaa          aaaaaa
bbbbbb          bbbbbb
cccccc      |   xxxxxxx
dddddd      |   ccccccy
            >   zzzzzzz
eeeeee          eeeeee
ffffff          ffffff

Sdiff dont think about "Matching rate" =(

I got it by sdiff.py

$ sdiff.py text1.txt text2.txt
--- text1.txt (utf-8)
+++ text2.txt (utf-8)
 1|aaaaaa             1|aaaaaa
 2|bbbbbb             2|bbbbbb
  |            >      3|xxxxxxx
 3|cccccc      |      4|ccccccy
 4|dddddd      <       |
  |            >      5|zzzzzzz
 5|eeeeee             6|eeeeee
 6|ffffff             7|ffffff

[     ]      |      + 
[ <-  ]     3|cccccc  
[  -> ]     4|ccccccy 

Sdiff.py think about "Matching rate" =)

I want result by sdiff.py. dont you ?

like image 154
user3103665 Avatar answered Nov 04 '25 06:11

user3103665


There is no direct c like code in difflib to show changed lines as in Perl's sdiff you talked about. But you can make one easily. In difflib's delta, the "changed lines" also have '- ', but in contrast to the actually deleted lines, the next line in the delta is tagged with '? ' to mean that the line in the previous index of the delta is "changed", not deleted. Another purpose of this line in delta is that it acts as 'guide' as to where the changes are in the line.

So, if a line in the delta is tagged with '- ', then there are four different cases depending on the next few lines of the delta:

Case 1: The line modified by inserting some characters

- The good bad
+ The good the bad
?          ++++

Case 2: The line is modified by deleting some characters

- The good the bad
?          ----
+ The good bad

Case 3: The line is modified by deleting and inserting and/or replacing some characters:

- The good the bad and ugly
?      ^^ ----
+ The g00d bad and the ugly
?      ^^          ++++

Case 4: The line is deleted

- The good the bad and the ugly
+ Our ratio is less than 0.75!

As you can see, the lines tagged with '? ' show exactly where what type of modification is made.

Note that difflib considers a line is deleted if the value of ratio() between the two lines being compared is less than 0.75. It is a value I found out by some tests.

So to infer a line as changed, you can do this. This will return the diffs with changed lines tagged with code 'c ', and unchanged lines tagged as 'u ', just like in Perl's sdiff:

def sdiffer(s1, s2):
    differ = difflib.Differ()
    diffs = list(differ.compare(s1, s2))

    i = 0
    sdiffs = []
    length = len(diffs)
    while i < length:
        line = diffs[i][2:]
        if diffs[i].startswith('  '):
            sdiffs.append(('u', line))

        elif diffs[i].startswith('+ '):
            sdiffs.append(('+', line))

        elif diffs[i].startswith('- '):
            if i+1 < length and diffs[i+1].startswith('? '): # then diffs[i+2] starts with ('+ '), obviously
                sdiffs.append(('c', line))
                i += 3 if i + 3 < length and diffs[i + 3].startswith('? ') else 2

            elif diffs[i+1].startswith('+ ') and i+2<length and diffs[i+2].startswith('? '):
                sdiffs.append(('c', line))
                i += 2
            else:
                sdiffs.append(('-', line))
        i += 1
    return sdiffs

Hope it helps.

P.S.: It is an old question, so I am not sure how well will my efforts be awarded. :-( I just could not help answering this question, as I have been working a little with difflib lately.

like image 34
Sнаđошƒаӽ Avatar answered Nov 04 '25 06:11

Sнаđошƒаӽ



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!