How can I replace a comma character in a CSV ONLY when included between " "

Question

I have a csv file which contains both words and amounts. When the amount is > 999 the number is enclosed within " " in order to differentiate the comma character used as thousand separator by the comma used as field separator, like this:

black, "1,340.00", brown, white, 150.00, blue
apple, 10.00, bread, coffee, "1,850.00", juice
cat, dog, 995.00, tiger, "2,450.00"

I wish to remove the comma ONLY where it's enclosed between " ", leaving the other comma (field separators), and also remove the " ". The output of the new csv should be like this:

black, 1340.00, brown, white, 150.00, blue 
apple, 10.00, bread, coffee, 1850.00, juice 
cat, dog, 995.00, tiger, 2450.00

I played around with sed and awk but I'm not sure about the best way to achieve it. Thank you!

John1024 · Accepted Answer

$ awk -F\" '{for (i=2; i<=NF; i+=2) gsub(/,/,"",$i)} 1' OFS="" input.csv
black, 1340.00, brown, white, 150.00, blue
apple, 10.00, bread, coffee, 1850.00, juice
cat, dog, 995.00, tiger, 2450.00

How it works

-F\"

This tells awk to use double-quotes as the field separator.
for (i=2; i<=NF; i=i+2) gsub(/,/,"",$i)

Every even field is a field in double-quotes. For those even fields, we remove commas.

This only works because we chose " as the field separator.
1

This is awk's cryptic shorthand for print-the-line.
OFS=""

This tells awk to use an empty string as the field separator on output. This has the effect of removing the quotes.

Ed Morton · Answer

$ sed -E 's/"([^"]+),([^"]+)"/\1\2/g' file
black, 1340.00, brown, white, 150.00, blue
apple, 10.00, bread, coffee, 1850.00, juice
cat, dog, 995.00, tiger, 2450.00

The above will only work for amounts under 999,999.99 (as shown in the sample input/output) since it can only remove one comma from each number.

This will work for any number:

awk '{while ( match($0,/([^"]*)("[^"]+")(.*)/,a) ) { gsub(/[",]/,"",a[2]); $0 = a[1] a[2] a[3] } }1' file

The above uses GNU awk for the 3rd arg to match(), with other awks it'd use 3 calls to substr() instead.

@John1024's answer is the more sensible, concise approach if you can have multiple commas in a field though.

How can I replace a comma character in a CSV ONLY when included between " "

Tags:

linux

bash

sed

awk

centos

Marco Falzone

2 Answers

How it works

John1024

Ed Morton

Recent Activity

Donate For Us

How can I replace a comma character in a CSV ONLY when included between " "

Tags:

linux

bash

sed

awk

centos

Marco Falzone

2 Answers

How it works

John1024

Ed Morton

Related questions

Recent Activity

Donate For Us