Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why diff with ignore matching lines doesn't work as expected?

Tags:

shell

diff

I have the following files:

file1.txt:

###################################################
Dump stat Title information for 'ssummary' view
###################################################
Tab=> 'Instance' Title=> {text {Total instances: 7831}}
Tab=> 'Device' Title=> {text {Total spice devices: 256}}
Tab=> 'Memory' Title=> {text {Total memory allocated: 962192 kB}}
Tab=> 'Cpu' Title=> {text {Total cumulative CPU time: 9030 ms}}

file2.txt:

###################################################
Dump stat Title information for 'ssummary' view
###################################################
Tab=> 'Instance' Title=> {text {Total instances: 7831}}
Tab=> 'Device' Title=> {text {Total spice devices: 256}}
Tab=> 'Memory' Title=> {text {Total memory allocated: 9621932 kB}}
Tab=> 'Cpu' Title=> {text {Total cumulative CPU time: 90303 ms}}

And I'm running the following command:

diff -I 'Memory' file1.txt file2.txt

which outputs:

6,7c6,7
< Tab=> 'Memory' Title=> {text {Total memory allocated: 962192 kB}}
< Tab=> 'Cpu' Title=> {text {Total cumulative CPU time: 9030 ms}}
---
> Tab=> 'Memory' Title=> {text {Total memory allocated: 9621932 kB}}
> Tab=> 'Cpu' Title=> {text {Total cumulative CPU time: 90303 ms}}

However my expected output is:

< Tab=> 'Cpu' Title=> {text {Total cumulative CPU time: 9030 ms}}
---
> Tab=> 'Cpu' Title=> {text {Total cumulative CPU time: 90303 ms}}

Note that in the command if I change 'Memory' to 'Tab' or 'Title' problem's solved, but probably all lines are ignored cause they all have Tab and Title.

like image 356
A K Avatar asked Apr 04 '13 09:04

A K


1 Answers

This behaviour is normal given the way diff works (as of April 2013).

diff is line oriented, it means that a line is either considered totally different or totally equivalent. When a line is ignored, it is entered into the list of different lines before comparison, and when the change script is computed, changes made only of ignored lines are considered themselves as ignored. When ignored lines are adjacent to changed lines, it makes up a single non-ignored change.

The problem lies in the inability of diff to understand that consecutive lines are not related: you are not diffing a sequence of text (what diff is aimed at), but rather a list of independent lines which are keyed (Tab >= <key>). These problems seem pretty similar when both files are generated in the same order, but still not the same.

like image 177
armel Avatar answered Oct 15 '22 07:10

armel