How do you parse a CSV file using gawk? Simply setting FS=","
is not enough, as a quoted field with a comma inside will be treated as multiple fields.
Example using FS=","
which does not work:
file contents:
one,two,"three, four",five
"six, seven",eight,"nine"
gawk script:
BEGIN { FS="," }
{
for (i=1; i<=NF; i++) printf "field #%d: %s\n", i, $(i)
printf "---------------------------\n"
}
bad output:
field #1: one
field #2: two
field #3: "three
field #4: four"
field #5: five
---------------------------
field #1: "six
field #2: seven"
field #3: eight
field #4: "nine"
---------------------------
desired output:
field #1: one
field #2: two
field #3: "three, four"
field #4: five
---------------------------
field #1: "six, seven"
field #2: eight
field #3: "nine"
---------------------------
You can use AWK to quickly look at a column of data in a CSV file.
The Comma Separated Values (CSV) Parser reads and writes data in a CSV format. Note: In the Config Editor, the parameters are set in the Parser tab of the Connector.
The FPAT variable offers a solution for cases like this. The value of FPAT should be a string that provides a regular expression. This regular expression describes the contents of each field.
The gawk version 4 manual says to use FPAT = "([^,]*)|(\"[^\"]+\")"
When FPAT
is defined, it disables FS
and specifies fields by content instead of by separator.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With