I have two files:
candidates.csv
:
id,value
1,123
4,1
2,5
50,5
blacklist.csv
:
1
2
5
3
10
I'd like to remove all rows from candidates.csv
in which the first column (id
) has a value contained in blacklist.csv
. id
is always numeric. In this case I'd like my output to look like this:
id,value
4,1
50,5
So far, my script for identifying the duplicate lines looks like this:
cat candidates.csv | cut -d \, -f 1 | grep -f blacklist.csv -w
This gives me the output
1
2
Now I somehow need to pipe this information back into sed
/awk
/gawk
/... to delete the duplicates, but I don't know how. Any ideas how I can continue from here? Or is there a better solution altogether? My only restriction is that it has to run in bash.
What about the following:
awk -F, '(NR==FNR){a[$1];next}!($1 in a)' blacklist.csv candidates.csv
How does this work?
An awk program is a series of pattern-action pairs, written as:
condition { action }
condition { action }
...
where condition
is typically an expression and action
a series of commands. Here, the first condition-action pairs read:
(NR==FNR){a[$1];next}
if the total record count NR
equals the record count of the file FNR
(i.e. if we are reading the first file), store all values in array a
and skip to the next record (do not do anything else)!($1 in a)
if the first field is not in the array a
then perform the default action which is print the line. This will only work on the second file as the condition of the first condition-action pair does not hold.If you are not too concerned about the order of the lines in your candidates.csv
file you could use the following:
join -v 1 -t, <(sort -t, candidates.csv) <(sort blacklist.csv)
The -v 1
requests all the lines from the first file (the sorted candidates.csv
) which do not match on the first field with the second file (the blacklist.csv
). The -t,
just sets the comma as a separator.
If you're concerned about the header line in the candidates.csv
file you could strip it before the sorting or change the order.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With