I have two CSV files about 134 mb.
All I want to do is get the 'diff' of the two files, except the position of a line doesn't matter.
In other words, let's say I have:
abc,123
def,456
and
def,456
ghi,789
I don't want to be told about def,456. It's in a different position in the second file, but I want it to be counted as not being different.
Just doing diff file1 file2 > outputfile isn't working. What command should I use to do this? I know this is trivial in PHP but I run out of memory quickly. I'd rather just use UNIX command line tools. Diff may not even be the right utility for this.
I would propose that you do a sort on the two input files and then compare the two sorted versions, something like this:
sort file1 > sorted_1
sort file2 > sorted_2
diff sorted_1 sorted_2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With