Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how do you parse comma-separated-values (csv) with awk?

Tags:

shell

unix

xml

csv

awk

I am trying to write an awk script to convert a CSV formatted spreadsheet into XML for Bugzilla bugs. The format of the input CSV is as follows (created from an XLS spreadsheet and saved as CSV):

tag_1,tag_2,...,tag_N
value1_1,value1_2,...,value1_N
value2_1,value2_2,...,value2_N
valueM_1,valueM_2,...,valueM_N

The header column represents the name of the XML tag. The above file converted to XML should look as follows:

<element>
    <tag_1>value1_1</tag_1>
    <tag_2>value1_2</tag_2>
    ...
    <tag_N>value1_N</tag_N>
</element>
<element>
    <tag_1>value2_1</tag_1>
    <tag_2>value2_2</tag_2>
    ...
    <tag_N>value2_N</tag_N>
</element>
...

The awk script I have to accomplish this follows:

BEGIN {OFS = "\n"}
NR == 1 {for (i = 1; i <=NF; i++)
            tag[i]=$i
         print "<bugzilla version=\"3.4.1\" urlbase=\"http://mozilla.com/\" maintainer=\"[email protected]\" exporter=\"[email protected]\">"}
NR != 1 {print "   <bug>"
         for (i = 1; i <= NF; i++)
            print "      <" tag[i] ">" $i "</" tag[i] ">"
         print "   </bug>"}
END {print "</bugzilla>"}

The actual CSV file is:

cf_foo,cf_bar,short_desc,cf_zebra,cf_pizza,cf_dumpling ,assigned_to,bug_status,cf_word,cf_caslte
ABCD,A-BAR-0032,A NICE DESCRIPTION - help me,pretty,Pepperoni,,,NEW,,

The actual output is:

$ awk -f csvtobugs.awk bugs.csv

<bugzilla version="3.4.1" urlbase="http://mozilla.com/" maintainer="[email protected]" exporter="[email protected]">
   <bug>
      <cf_foo,cf_bar,short_desc,cf_zebra,cf_pizza,cf_dumpling>ABCD,A-BAR-0032,A</cf_foo,cf_bar,short_desc,cf_zebra,cf_pizza,cf_dumpling>
      <,assigned_to,bug_status,cf_word,cf_caslte>NICE</,assigned_to,bug_status,cf_word,cf_caslte>
      <>DESCRIPTION</>
      <>-</>
      <>help</>
      <>me,pretty,Pepperoni,,,NEW,,</>
   </bug>
   <bug>
   </bug>
</bugzilla>

Clearly, not the intended result (I admit, I copy-pasted this script from this forum: http://www.unix.com/shell-programming-scripting/21404-csv-xml.html). The problem is that it's been SOOOOO long since I've looked at awk scripts and I have NO IDEA what the syntax means.

like image 636
les2 Avatar asked Jan 28 '26 14:01

les2


1 Answers

You need to set FS = "," in the BEGIN rule to use comma as your field separator; the code as you show it should work if the field separator was a tab, which is a different (also popular) convention in files that are often still called "CSV" even then commas aren't used;-).

like image 200
Alex Martelli Avatar answered Jan 31 '26 03:01

Alex Martelli



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!