Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse a csv using awk and ignoring commas inside a field

Tags:

csv

awk

I have a csv file where each row defines a room in a given building. Along with room, each row has a floor field. What I want to extract is all floors in all buildings.

My file looks like this...

"u_floor","u_room","name" 0,"00BDF","AIRPORT TEST            " 0,0,"BRICKER HALL, JOHN W    " 0,3,"BRICKER HALL, JOHN W    " 0,5,"BRICKER HALL, JOHN W    " 0,6,"BRICKER HALL, JOHN W    " 0,7,"BRICKER HALL, JOHN W    " 0,8,"BRICKER HALL, JOHN W    " 0,9,"BRICKER HALL, JOHN W    " 0,19,"BRICKER HALL, JOHN W    " 0,20,"BRICKER HALL, JOHN W    " 0,21,"BRICKER HALL, JOHN W    " 0,25,"BRICKER HALL, JOHN W    " 0,27,"BRICKER HALL, JOHN W    " 0,29,"BRICKER HALL, JOHN W    " 0,35,"BRICKER HALL, JOHN W    " 0,45,"BRICKER HALL, JOHN W    " 0,59,"BRICKER HALL, JOHN W    " 0,60,"BRICKER HALL, JOHN W    " 0,61,"BRICKER HALL, JOHN W    " 0,63,"BRICKER HALL, JOHN W    " 0,"0006M","BRICKER HALL, JOHN W    " 0,"0008A","BRICKER HALL, JOHN W    " 0,"0008B","BRICKER HALL, JOHN W    " 0,"0008C","BRICKER HALL, JOHN W    " 0,"0008D","BRICKER HALL, JOHN W    " 0,"0008E","BRICKER HALL, JOHN W    " 0,"0008F","BRICKER HALL, JOHN W    " 0,"0008G","BRICKER HALL, JOHN W    " 0,"0008H","BRICKER HALL, JOHN W    " 

What I want is all floors in all buildings.

I am using cat, awk, sort and uniq to obtain this list although I am having a problem with the "," in the building name field such as "BRICKER HALL, JOHN W" and it is throwing off my entire csv generation.

cat Buildings.csv | awk -F, '{print $1","$2}' | sort | uniq > Floors.csv  

How can I get awk to use the comma but ignore a comma in between "" of a field? Alternatively, does someone have a better solution?

Based on the answer provided suggesting a awk csv parser I was able to get the solution:

cat Buildings.csv | awk -f csv.awk | awk -F" -> 2|"  '{print $2}' | awk -F"|" '{print $2","$3}' | sort | uniq > floors.csv  

There we want to use the csv awk program and then from there I want to use a " -> 2|" which is formatting based on the csv awk program. The print $2 there prints only the csv parsed contents, this is because the program prints the original line followed by " -> #" where # is the count parsed from csv. (Ie. the columns.) From there I can split this awk csv result on the "|" whcih is what it replaces the comma's with. Then the sort, uniq and pipe out to a file and done!

Thanks for the help.

like image 205
Chris Avatar asked Nov 17 '10 14:11

Chris


People also ask

How do I avoid commas in a CSV file?

Re: Handling 'comma' in the data while writing to a CSV. So for data fields that contain a comma, you should just be able to wrap them in a double quote. Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes.

How do I use a CSV file AWK?

More installation instructions found in the readme. And you can pretend that AWK natively supports CSV files. ( You can use this same trick with other UNIX line-oriented tools. head , tail and sort don't understand CSV either, but if you wrap them in csvquote you will be able to handle delimited line breaks correctly.)

Does AWK work on CSV?

You can use AWK to quickly look at a column of data in a CSV file.

What is FPAT in AWK?

For a CSV file the FPAT value is: FPAT = "([^,]+)|(\"[^\"]+\")" Using the data: abc,"pqr,mno" The first grouped expression evaluates to everything i.e. not a comma, this should take "abc" as data then fail for the first occurrence of comma.


1 Answers

gawk -vFPAT='[^,]*|"[^"]*"' '{print $1 "," $3}' | sort | uniq 

This is an awesome GNU Awk 4 extension, where you define a field pattern instead of a field-separator pattern. Does wonders for CSV. (docs)

ETA (thanks mitchus): To remove the surrounding quotes, gsub("^\"|\"$","",$3); if there's more fields than just $3 to process that way, just loop through them.
Note this simple approach is not tolerant of malformed input, nor of some possible special characters between quotes – covering all of those would go beyond the scope of a neat one-liner.

like image 126
hemflit Avatar answered Sep 19 '22 04:09

hemflit