I have two files A and B. I want to find all the lines in A that are not in B. What's the fastest way to do this in bash/using standard linux utilities? Here's what I tried so far:
for line in `cat file1`
do
if [ `grep -c "^$line$" file2` -eq 0]; then
echo $line
fi
done
It works, but it's slow. Is there a faster way of doing this?
The BashFAQ describes doing exactly this with comm, which is the canonically correct method.
# Subtraction of file1 from file2
# (i.e., only the lines unique to file2)
comm -13 <(sort file1) <(sort file2)
diff is less appropriate for this task, as it tries to operate on blocks rather than individual lines; as such, the algorithms it has to use are more complex and less memory-efficient.
comm has been part of the Single Unix Specification since SUS2 (1997).
If you simply want lines that are in file A
, but not in B
, you can sort the files, and compare them with diff.
sort A > A.sorted
sort B > B.sorted
diff -u A.sorted B.sorted | grep '^-'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With