I have two files that I want to diff. The lines have timestamps and possibly some other stuff I would like to ignore for the matching algorithm, but I still want those items output if the matching algorithm finds a difference in the rest of the text. For example:
1c1
< [junit4] 2013-01-11 04:43:57,392 INFO com.example.MyClass:123 [main] [loadOverridePropFile] Config file application.properties not found: java.io.FileNotFoundException: /path/to/application.properties (No such file or directory)
---
> [junit4] 2013-01-11 22:16:07,398 INFO com.example.MyClass:123 [main] [loadOverridePropFile] Config file application.properties not found: java.io.FileNotFoundException: /path/to/application.properties (No such file or directory)
SHOULD NOT be emitted but:
1c1
< [junit4] 2013-01-11 04:43:57,392 INFO com.example.MyClass:123 [main] [loadOverridePropFile] Config file application.properties not found: java.io.FileNotFoundException: /path/to/application.properties (No such file or directory)
---
> [junit4] 2013-01-11 22:16:07,398 INFO com.example.MyClass:456 [main] [loadOverridePropFile] Config file application.properties not found: java.io.FileNotFoundException: /path/to/application.properties (No such file or directory)
SHOULD be emitted (since the line numbers are different). Note that the timestamps are still emitted.
How can this be done?
I wished for this feature a couple of times before myself, and since it popped up here again I decided to google around a bit and found perl's Algorithm::Diff
which you can feed a hashing function (they call it "key generation functions") which "should return a string that uniquely identifies a given element" that the algorithm uses to do the comparison (instead of the actual content that you feed it with).
Basically, all you need to do is add a sub that does some regex magic in a way that you desire to filter out unwanted stuff from your string and add the subref as parameter to the call to diff()
(see my CHANGE 1
and CHANGE 2
comments in the snippet below).
If you require normal (or unified) diff
output, check the elaborate diffnew.pl
example that the module ships with and do the necessary changes in this file. For demonstration purposes, I will use the simple diff.pl
that it also ships with since it is short and I can fully post it here.
#!/usr/bin/perl
# based on diff.pl that ships with Algorithm::Diff
# demonstrates the use of a key generation function
# the original diff.pl is:
# Copyright 1998 M-J. Dominus. ([email protected])
# This program is free software; you can redistribute it and/or modify it
# under the same terms as Perl itself.
use Algorithm::Diff qw(diff);
die("Usage: $0 file1 file2") unless @ARGV == 2;
my ($file1, $file2) = @ARGV;
-f $file1 or die("$file1: not a regular file");
-f $file2 or die("$file2: not a regular file");
-T $file1 or die("$file1: binary file");
-T $file2 or die("$file2: binary file");
open (F1, $file1) or die("Couldn't open $file1: $!");
open (F2, $file2) or die("Couldn't open $file2: $!");
chomp(@f1 = <F1>);
close F1;
chomp(@f2 = <F2>);
close F2;
# CHANGE 1
# $diffs = diff(\@f1, \@f2);
$diffs = diff(\@f1, \@f2, \&keyfunc);
exit 0 unless @$diffs;
foreach $chunk (@$diffs)
{
foreach $line (@$chunk)
{
my ($sign, $lineno, $text) = @$line;
printf "%4d$sign %s\n", $lineno+1, $text;
}
}
exit 1;
# CHANGE 2 {
sub keyfunc
{
my $_ = shift;
s/^(\d{2}:\d{2})\s+//;
return $_;
}
# }
12:15 one two three
13:21 three four five
10:01 one two three
14:38 seven six eight
$ ./mydiff.pl one.txt two.txt
2- 13:21 three four five
2+ 13:21 seven six eight
And here is one in normal diff
output based on the diffnew.pl
$ ./my_diffnew.pl one.txt two.txt
2c2
< 13:21 three four five
---
> 13:21 seven six eight
As you can see, the first line in either file gets ignored because they only differ in their timestamp and the hashing function removes those for the comparison.
Voilà, you just rolled your own content-aware diff
!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With