I am using awk to perform counting the sum of one column in the csv file. The data format is something like:
id, name, value 1, foo, 17 2, bar, 76 3, "I am the, question", 99
I was using this awk script to count the sum:
awk -F, '{sum+=$3} END {print sum}'
Some of the value in name field contains comma and this break my awk script. My question is: can awk solve this problem? If yes, and how can I do that?
Thank you.
More installation instructions found in the readme. And you can pretend that AWK natively supports CSV files. ( You can use this same trick with other UNIX line-oriented tools. head , tail and sort don't understand CSV either, but if you wrap them in csvquote you will be able to handle delimited line breaks correctly.)
You need to specify text qualifiers. Generally a double quote (") is used as text qualifiers. All the text is always put inside it and all the commas inside a text qualifier is ignored. This is a standard method for all CSV, languages and all platforms for properly handling the text.
The unquoted commas in the print part tell AWK to use the chosen field separator (in this case, a comma) when printing. Note the extra commas in quotes – that's how the blank columns are added. First a comma is printed, then $1, then a comma, then $2...
A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. The use of the comma as a field separator is the source of the name for this file format.
One way using GNU awk
and FPAT
awk 'BEGIN { FPAT = "([^, ]+)|(\"[^\"]+\")" } { sum+=$3 } END { print sum }' file.txt
Result:
192
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With