Consider the following files and diff results:
a1.txt
a
b
My name is Ian
a2.txt
a
a
b
My name is John
Running diff --side-by-side --suppress-common-lines a1.txt a2.txt produces:
> a
My name is Ian | My name is John
Which correctly states that a was added in a2.txt and My name is Ian changed to My name is John.
However, if I remove the b from both files, the produced results are different:
b1.txt
a
My name is Ian
b2.txt
a
a
My name is John
Running diff --side-by-side --suppress-common-lines b1.txt b2.txt produces:
My name is Ian | a
> My name is John
This states that line My name is Ian changed to a and My name is John was added to b2.txt.
Even though the result of the second comparison is technically valid, the difference between a1.txt and a2.txt is equivalent to that of b1.txt and b2.txt, so why would the result not be equal?
Is there anything I can do such that the second comparison produces the same output as the first?
The discrepancy you observe between the two examples is normal; it just conflicts with your expectations of what diff does. The diff utility solves the longest-common-subsequence problem, using lines as units/atoms.
[...] the difference between
a1.txtanda2.txtis equivalent to that ofb1.txtandb2.txt, so why would the result not be equal?
Here, the longest common subsequences in your two examples are different and, roughly speaking, don't "line up" the same way. In the first example, you have
# a1.txt # a2.txt # line in common?
a n
a a y
b b y
My name is Ian My name is John n
whereas, in the second example, you have
# b1.txt # b2.txt # line in common?
a a y
My name is Ian a n
My name is John n
Therefore, as far as diff is concerned, the differences between the two pairs of files are not equivalent. diff has no memory that all you did to obtain the b[12].txt files was to remove the b line from each of the a[12].txt files. All it sees is that the longest common subsequence now only consists in the one line that contains a, and it deduces the difference between the two b[12].txt files from that.
Is there anything I can do such that the second comparison produces the same output as the first?
Short of using a different diff algorithm (or implementing your own), I don't think so.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With