I have a (large) CSV file (A) that has this structure:
1234ABC 456789
1235ABD 098732
1235ABE 098731
1235ABF 198731
Another file (B) contains entries that should be removed from A:
1234ABC
1235ABE
I want to run a awk
or sed
command (or some command-line script if awk
or sed
are not sufficient) that removes all lines from A whose first column is equal to a line in B. I.e. the result in A after the script has run should be:
1235ABD 098732
1235ABF 198731
Note that it's not enough to simply remove a line in A that starts with any of the lines in B. For example, if A contains:
1235AC 456789
1235A 098732
and B contains:
1235A
then A should contain this afterwards:
1235AC 456789
How can I achieve this in bash, preferably using awk
or sed
(or a shell script if required)?
You may use this awk
:
awk 'NR == FNR {dels[$1]; next} !($1 in dels)' file2.csv file1.csv
1235ABD 098732
1235ABF 198731
This is standard 2 pass awk command that stores all lines of file2
in first pass in an array dels
.
In 2nd pass we just print lines from file1
where $1
doesn't exist in array dels
.
$ cat fileA
1234ABC 456789
1235ABD 098732
1235ABE 098731
1235ABF 198731
1235AC 456789
1235A 098732
$ cat fileB
1234ABC
1235ABE
1235A
One grep
idea using inverted word matches from file fileB
:
$ grep -vwf fileB fileA
1235ABD 098732
1235ABF 198731
1235AC 456789
NOTE: this will apply the match across the entire line (ie, not just the first column) so likely won't be accurate if the entries from fileB
can show up in columns 2-thru-N of fileA
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With