Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subtracting lines in one file from another file

Tags:

unix

sed

awk

I couldn't find an answer that truly subtracts one file from another.

My goal is to remove lines in one file that occur in another file. Multiple occurences should be respected, which means for exammple if one line occurs 4 times in file A and only once in file B, file C should have 3 of those lines.

File A:

1
3
3
3
4
4

File B:

1
3
4

File C (desired output)

3
3
4

Thanks in advance

like image 989
Hawk Avatar asked Mar 06 '17 10:03

Hawk


1 Answers

In awk:

$ awk 'NR==FNR{a[$0]--;next} ($0 in a) && ++a[$0] > 0' f2 f1
3
3
4

Explained:

NR==FNR {                  # for each record in the first file
    a[$0]--;               # for each identical value, decrement a[value] (of 0)
    next
} 
($0 in a) && ++a[$0] > 0'  # if record in a, increment a[value]
                           # once over remove count in first file, output

If you want to print items in f1 that are not in f2 you can lose ($0 in a) &&:

$ echo 5 >> f1
$ awk 'NR==FNR{a[$0]--;next} (++a[$0] > 0)' f2 f1
3
3
4
5
like image 121
James Brown Avatar answered Sep 22 '22 03:09

James Brown