Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I get UNIX diff to ignore duplicate lines in different positions?

Tags:

unix

diff

csv

I have two CSV files about 134 mb.

All I want to do is get the 'diff' of the two files, except the position of a line doesn't matter.

In other words, let's say I have:

abc,123
def,456

and

def,456
ghi,789

I don't want to be told about def,456. It's in a different position in the second file, but I want it to be counted as not being different.

Just doing diff file1 file2 > outputfile isn't working. What command should I use to do this? I know this is trivial in PHP but I run out of memory quickly. I'd rather just use UNIX command line tools. Diff may not even be the right utility for this.

like image 423
Phil Avatar asked Dec 20 '25 22:12

Phil


1 Answers

I would propose that you do a sort on the two input files and then compare the two sorted versions, something like this:

sort file1 > sorted_1
sort file2 > sorted_2

diff sorted_1 sorted_2
like image 134
Fredrik Pihl Avatar answered Dec 24 '25 11:12

Fredrik Pihl



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!