Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove lines from CSV file that matches lines from another file from bash? [duplicate]

I have a (large) CSV file (A) that has this structure:

1234ABC 456789
1235ABD 098732
1235ABE 098731
1235ABF 198731

Another file (B) contains entries that should be removed from A:

1234ABC
1235ABE

I want to run a awk or sed command (or some command-line script if awk or sed are not sufficient) that removes all lines from A whose first column is equal to a line in B. I.e. the result in A after the script has run should be:

1235ABD 098732
1235ABF 198731

Note that it's not enough to simply remove a line in A that starts with any of the lines in B. For example, if A contains:

1235AC 456789
1235A 098732

and B contains:

1235A

then A should contain this afterwards:

1235AC 456789

How can I achieve this in bash, preferably using awk or sed (or a shell script if required)?

like image 849
Johan Avatar asked Sep 19 '25 22:09

Johan


2 Answers

You may use this awk:

awk 'NR == FNR {dels[$1]; next} !($1 in dels)' file2.csv file1.csv

1235ABD 098732
1235ABF 198731

This is standard 2 pass awk command that stores all lines of file2 in first pass in an array dels.

In 2nd pass we just print lines from file1 where $1 doesn't exist in array dels.

like image 142
anubhava Avatar answered Sep 22 '25 10:09

anubhava


$ cat fileA
1234ABC 456789
1235ABD 098732
1235ABE 098731
1235ABF 198731
1235AC 456789
1235A 098732

$ cat fileB
1234ABC
1235ABE
1235A

One grep idea using inverted word matches from file fileB:

$ grep -vwf fileB fileA
1235ABD 098732
1235ABF 198731
1235AC 456789

NOTE: this will apply the match across the entire line (ie, not just the first column) so likely won't be accurate if the entries from fileB can show up in columns 2-thru-N of fileA

like image 27
markp-fuso Avatar answered Sep 22 '25 12:09

markp-fuso