Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing a CSV file using gawk

Tags:

bash

csv

awk

gawk

How do you parse a CSV file using gawk? Simply setting FS="," is not enough, as a quoted field with a comma inside will be treated as multiple fields.

Example using FS="," which does not work:

file contents:

one,two,"three, four",five
"six, seven",eight,"nine"

gawk script:

BEGIN { FS="," }
{
  for (i=1; i<=NF; i++) printf "field #%d: %s\n", i, $(i)
  printf "---------------------------\n"
}

bad output:

field #1: one
field #2: two
field #3: "three
field #4:  four"
field #5: five
---------------------------
field #1: "six
field #2:  seven"
field #3: eight
field #4: "nine"
---------------------------

desired output:

field #1: one
field #2: two
field #3: "three, four"
field #4: five
---------------------------
field #1: "six, seven"
field #2: eight
field #3: "nine"
---------------------------
like image 351
MCS Avatar asked Nov 24 '08 14:11

MCS


People also ask

Does awk work on CSV?

You can use AWK to quickly look at a column of data in a CSV file.

What does parse CSV mean?

The Comma Separated Values (CSV) Parser reads and writes data in a CSV format. Note: In the Config Editor, the parameters are set in the Parser tab of the Connector.

What is FPAT in awk?

The FPAT variable offers a solution for cases like this. The value of FPAT should be a string that provides a regular expression. This regular expression describes the contents of each field.


1 Answers

The gawk version 4 manual says to use FPAT = "([^,]*)|(\"[^\"]+\")"

When FPAT is defined, it disables FS and specifies fields by content instead of by separator.

like image 190
BCoates Avatar answered Oct 21 '22 06:10

BCoates