I need to delete 2 columns in a comma seperated values file. Consider the following line in the csv file:
"[email protected],www.example.com",field2,field3,field4 "[email protected]",field2,field3,field4
Now, the result I want at the end:
"[email protected],www.example.com",field4 "[email protected]",field4
I used the following command:
awk 'BEGIN{FS=OFS=","}{print $1,$4}'
But the embedded comma which is inside quotes is creating a problem, Following is the result I am getting:
"[email protected],field3 "[email protected]",field4
Now my question is how do I make awk ignore the "," which are inside the double quotes?
For a CSV file the FPAT value is: FPAT = "([^,]+)|(\"[^\"]+\")" Using the data: abc,"pqr,mno" The first grouped expression evaluates to everything i.e. not a comma, this should take "abc" as data then fail for the first occurrence of comma.
To remove the ' from the awk output you can use sed "s/^'//;s/'$//" This command removes the ' only at the beginning and the end of the output line and is not so heavy as to use awk and not so general if using tr.
From the GNU awk manual (http://www.gnu.org/software/gawk/manual/gawk.html#Splitting-By-Content):
$ awk -vFPAT='([^,]*)|("[^"]+")' -vOFS=, '{print $1,$4}' file "[email protected],www.example.com",field4 "[email protected]",field4
and see What's the most robust way to efficiently parse CSV using awk? for more generally parsing CSVs that include newlines, etc. within fields.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With