Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

parse a csv file that contains commans in the fields with awk

Tags:

csv

awk

i have to use awk to print out 4 different columns in a csv file. The problem is the strings are in a $x,xxx.xx format. When I run the regular awk command.

awk -F, {print $1} testfile.csv 

my output `ends up looking like

307.00
$132.34
30.23

What am I doing wrong.

"$141,818.88","$52,831,578.53","$52,788,069.53" this is roughly the input. The file I have to parse is 90,000 rows and about 40 columns This is how the input is laid out or at least the parts of it that I have to deal with. Sorry if I made you think this wasn't what I was talking about.

If the input is "$307.00","$132.34","$30.23" I want the output to be in a

$307.00
$132.34
$30.23
like image 990
Dudusmaximus Avatar asked Dec 04 '10 01:12

Dudusmaximus


People also ask

Does AWK work for CSV files?

More installation instructions found in the readme. And you can pretend that AWK natively supports CSV files. ( You can use this same trick with other UNIX line-oriented tools. head , tail and sort don't understand CSV either, but if you wrap them in csvquote you will be able to handle delimited line breaks correctly.)

How does Unix handle commas in CSV files?

You could use the "tr" command to translate all commas to a space. The "tr" command reads from standard input. To have "tr" read from a file, redirect standard input from the file. Finally, redirect standard output to another file.

What is FPAT in AWK?

For a CSV file the FPAT value is: FPAT = "([^,]+)|(\"[^\"]+\")" Using the data: abc,"pqr,mno" The first grouped expression evaluates to everything i.e. not a comma, this should take "abc" as data then fail for the first occurrence of comma.

How do you put commas in AWK?

The unquoted commas in the print part tell AWK to use the chosen field separator (in this case, a comma) when printing. Note the extra commas in quotes – that's how the blank columns are added. First a comma is printed, then $1, then a comma, then $2...


1 Answers

Oddly enough I had to tackle this problem some time ago and I kept the code around to do it. You almost had it, but you need to get a bit tricky with your field separator(s).

awk -F'","|^"|"$' '{print $2}' testfile.csv 

Input

# cat testfile.csv
"$141,818.88","$52,831,578.53","$52,788,069.53"
"$2,558.20","$482,619.11","$9,687,142.69"
"$786.48","$8,568,159.41","$159,180,818.00"

Output

# awk -F'","|^"|"$' '{print $2}' testfile.csv
$141,818.88
$2,558.20
$786.48

You'll note that the "first" field is actually $2 because of the field separator ^". Small price to pay for a short 1-liner if you ask me.

like image 112
SiegeX Avatar answered Oct 28 '22 08:10

SiegeX