How to find set difference of two files?

Question

I have two files A and B. I want to find all the lines in A that are not in B. What's the fastest way to do this in bash/using standard linux utilities? Here's what I tried so far:

for line in `cat file1`
 do
   if [ `grep -c "^$line$" file2` -eq 0]; then
   echo $line
   fi
 done

It works, but it's slow. Is there a faster way of doing this?

Charles Duffy · Accepted Answer

The BashFAQ describes doing exactly this with comm, which is the canonically correct method.

# Subtraction of file1 from file2
# (i.e., only the lines unique to file2)
comm -13 <(sort file1) <(sort file2)

diff is less appropriate for this task, as it tries to operate on blocks rather than individual lines; as such, the algorithms it has to use are more complex and less memory-efficient.

comm has been part of the Single Unix Specification since SUS2 (1997).

tonio · Answer

If you simply want lines that are in file A, but not in B, you can sort the files, and compare them with diff.

sort A > A.sorted
sort B > B.sorted
diff -u A.sorted B.sorted | grep '^-'

How to find set difference of two files?

Tags:

bash

gnu-coreutils

spinlok

2 Answers

Charles Duffy

tonio

Recent Activity

Donate For Us

How to find set difference of two files?

Tags:

bash

gnu-coreutils

spinlok

2 Answers

Charles Duffy

tonio

Related questions

Recent Activity

Donate For Us