Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

csv file filtering

Tags:

sed

awk

I have a .csv file with a header row like so;

headerA,headerB,headerC
bill,jones,p
mike,smith,f
sally,silly,p

I'd like to filter out any records with the f value in the headerC column.

Can I do that with sed or awk?

like image 826
Ben Avatar asked Dec 21 '22 12:12

Ben


2 Answers

If header does not contains only f at the third columns name:

sed '/,f$/d' FILE

will do (deletes every line from the input if it ends with ,f).

If it has, I'd go with:

sed -n -e '1p;/,[^f]$/p' FILE

(Does not print anything by default (-n) but the 1st line must 1p, and if the lines are ends with other char than f... Note: this will not work, if the 3rd columnc contains more than one char.)

And an awk one:

awk -F, 'NF == 1 ; NF > 1 && $3 != "f"' FILE

(This always prints the first line (NF == 1 is true, then default action, which is print $0, then the next condtitions are checking if we had got over the 1st line, and the 3rd field is not f then default action...)

HTH

like image 141
Zsolt Botykai Avatar answered Jan 01 '23 22:01

Zsolt Botykai


well, if you know that headerC is always in the third column, the following sed command would work:

sed -r '/[^,]+(,[^,]+){1},f/ d' < file.csv > filefiltered.csv

And the following awk command does the same:

awk 'BEGIN {FS=","} {if($3 != "f") print}' file.csv

If you don't know headerC is always in a particular column it gets a little more tricky. Does this work?

like image 29
Michael Lowman Avatar answered Jan 01 '23 22:01

Michael Lowman