Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between two files without sorting

I have the files file1 and file2, where file2 is a subset of file1. That means, if I iterate over file1, there are some lines that are in file2, and some that aren't, but there is no line in file2 that is not in file1. There may be several lines with the same content in a file. Now I want to get the difference between them, that is, all lines of file1 that aren't in file2.

According to this well received answer

diff(1) isn't the answer, comm(1) is.

(For whatever reason)

But as I understand, for comm the files need to be sorted first. The problem: Both files are ordered (not sorted!), and this order needs to be kept. So what I really want is to iterate over file1, and check for every line, if it is also in file2. If not, write it to file3. If the same content occurs more than once, it should be kept more than once!

Is there any way to do this with the command line?

like image 754
Yanick Nedderhoff Avatar asked Oct 17 '25 00:10

Yanick Nedderhoff


1 Answers

Try this with GNU grep:

grep -vFf file2 file1 > file3

Update:

grep -vxFf file2 file1 > file3
like image 134
Cyrus Avatar answered Oct 19 '25 14:10

Cyrus



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!