Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I replace a comma character in a CSV ONLY when included between " "

I have a csv file which contains both words and amounts. When the amount is > 999 the number is enclosed within " " in order to differentiate the comma character used as thousand separator by the comma used as field separator, like this:

black, "1,340.00", brown, white, 150.00, blue
apple, 10.00, bread, coffee, "1,850.00", juice
cat, dog, 995.00, tiger, "2,450.00"

I wish to remove the comma ONLY where it's enclosed between " ", leaving the other comma (field separators), and also remove the " ". The output of the new csv should be like this:

black, 1340.00, brown, white, 150.00, blue 
apple, 10.00, bread, coffee, 1850.00, juice 
cat, dog, 995.00, tiger, 2450.00

I played around with sed and awk but I'm not sure about the best way to achieve it. Thank you!

like image 287
Marco Falzone Avatar asked Oct 26 '25 17:10

Marco Falzone


2 Answers

$ awk -F\" '{for (i=2; i<=NF; i+=2) gsub(/,/,"",$i)} 1' OFS="" input.csv
black, 1340.00, brown, white, 150.00, blue
apple, 10.00, bread, coffee, 1850.00, juice
cat, dog, 995.00, tiger, 2450.00

How it works

  • -F\"

    This tells awk to use double-quotes as the field separator.

  • for (i=2; i<=NF; i=i+2) gsub(/,/,"",$i)

    Every even field is a field in double-quotes. For those even fields, we remove commas.

    This only works because we chose " as the field separator.

  • 1

    This is awk's cryptic shorthand for print-the-line.

  • OFS=""

    This tells awk to use an empty string as the field separator on output. This has the effect of removing the quotes.

like image 113
John1024 Avatar answered Oct 29 '25 06:10

John1024


$ sed -E 's/"([^"]+),([^"]+)"/\1\2/g' file
black, 1340.00, brown, white, 150.00, blue
apple, 10.00, bread, coffee, 1850.00, juice
cat, dog, 995.00, tiger, 2450.00

The above will only work for amounts under 999,999.99 (as shown in the sample input/output) since it can only remove one comma from each number.

This will work for any number:

awk '{while ( match($0,/([^"]*)("[^"]+")(.*)/,a) ) { gsub(/[",]/,"",a[2]); $0 = a[1] a[2] a[3] } }1' file

The above uses GNU awk for the 3rd arg to match(), with other awks it'd use 3 calls to substr() instead.

@John1024's answer is the more sensible, concise approach if you can have multiple commas in a field though.

like image 42
Ed Morton Avatar answered Oct 29 '25 08:10

Ed Morton



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!