Parse a csv using awk and ignoring commas inside a field

Tags:

awk

I have a csv file where each row defines a room in a given building. Along with room, each row has a floor field. What I want to extract is all floors in all buildings.

My file looks like this...

"u_floor","u_room","name" 0,"00BDF","AIRPORT TEST            " 0,0,"BRICKER HALL, JOHN W    " 0,3,"BRICKER HALL, JOHN W    " 0,5,"BRICKER HALL, JOHN W    " 0,6,"BRICKER HALL, JOHN W    " 0,7,"BRICKER HALL, JOHN W    " 0,8,"BRICKER HALL, JOHN W    " 0,9,"BRICKER HALL, JOHN W    " 0,19,"BRICKER HALL, JOHN W    " 0,20,"BRICKER HALL, JOHN W    " 0,21,"BRICKER HALL, JOHN W    " 0,25,"BRICKER HALL, JOHN W    " 0,27,"BRICKER HALL, JOHN W    " 0,29,"BRICKER HALL, JOHN W    " 0,35,"BRICKER HALL, JOHN W    " 0,45,"BRICKER HALL, JOHN W    " 0,59,"BRICKER HALL, JOHN W    " 0,60,"BRICKER HALL, JOHN W    " 0,61,"BRICKER HALL, JOHN W    " 0,63,"BRICKER HALL, JOHN W    " 0,"0006M","BRICKER HALL, JOHN W    " 0,"0008A","BRICKER HALL, JOHN W    " 0,"0008B","BRICKER HALL, JOHN W    " 0,"0008C","BRICKER HALL, JOHN W    " 0,"0008D","BRICKER HALL, JOHN W    " 0,"0008E","BRICKER HALL, JOHN W    " 0,"0008F","BRICKER HALL, JOHN W    " 0,"0008G","BRICKER HALL, JOHN W    " 0,"0008H","BRICKER HALL, JOHN W    "

What I want is all floors in all buildings.

I am using cat, awk, sort and uniq to obtain this list although I am having a problem with the "," in the building name field such as "BRICKER HALL, JOHN W" and it is throwing off my entire csv generation.

cat Buildings.csv | awk -F, '{print $1","$2}' | sort | uniq > Floors.csv

How can I get awk to use the comma but ignore a comma in between "" of a field? Alternatively, does someone have a better solution?

Based on the answer provided suggesting a awk csv parser I was able to get the solution:

cat Buildings.csv | awk -f csv.awk | awk -F" -> 2|"  '{print $2}' | awk -F"|" '{print $2","$3}' | sort | uniq > floors.csv

There we want to use the csv awk program and then from there I want to use a " -> 2|" which is formatting based on the csv awk program. The print $2 there prints only the csv parsed contents, this is because the program prints the original line followed by " -> #" where # is the count parsed from csv. (Ie. the columns.) From there I can split this awk csv result on the "|" whcih is what it replaces the comma's with. Then the sort, uniq and pipe out to a file and done!

Thanks for the help.

205

asked Nov 17 '10 14:11

Chris

1 Answers

gawk -vFPAT='[^,]*|"[^"]*"' '{print $1 "," $3}' | sort | uniq

This is an awesome GNU Awk 4 extension, where you define a field pattern instead of a field-separator pattern. Does wonders for CSV. (docs)

ETA (thanks mitchus): To remove the surrounding quotes, gsub("^\"|\"$","",$3); if there's more fields than just $3 to process that way, just loop through them.
Note this simple approach is not tolerant of malformed input, nor of some possible special characters between quotes – covering all of those would go beyond the scope of a neat one-liner.

126

answered Sep 19 '22 04:09

hemflit

Related questions
                            
                                How to convert a tab-separated file into a comma-separated file?
                            
                                String parsing in Java with delimiter tab "\t" using split
                            
                                setting a UTF-8 in java and csv file [duplicate]
                            
                                How can I get the total number of rows in a CSV file with PHP?
                            
                                How to parse a CSV in a Bash script?
                            
                                Read .csv file from URL into Python 3.x - _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
                            
                                How can I read tar.gz file using pandas read_csv with gzip compression option?
                            
                                How do I download a file using VBA (without Internet Explorer)
                            
                                Custom delimiter csv reader spark
                            
                                Javascript: Exporting large text/csv file crashes Google Chrome
                            
                                Groovy load .csv files
                            
                                Extract specific columns from delimited file using Awk
                            
                                Python parse CSV ignoring comma with double-quotes
                            
                                Spark 2.0.x dump a csv file from a dataframe containing one array of type string
                            
                                Read multiple CSV files into separate data frames
                            
                                Extract csv file specific columns to list in Python
                            
                                escaping tricky string to CSV format
                            
                                csv to array in d3.js
                            
                                Yahoo! Finance CSV file will not return Dow Jones (^DJI)
                            
                                Choosing between tsv and csv

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With