Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing Extra commas from Comma delimited file

Tags:

regex

csv

sed

awk

perl

I have a comma delimited file with 12 columns.

There is problem with 5th and 6th columns (the text in 5th and 6th column is identical, but may have extra commas between them) which contains extra commas.

 2011,123456,1234567,12345678,Hey There,How are you,Hey There,How are you,882864309037,ABC   ABCD,LABACD,1.00000000,80.2500000,One Two

So in the above example "Hey There,How are you" should not have a comma.

I need to remove extra commas in 5th and 6th column.

like image 366
Stu Avatar asked Mar 22 '23 12:03

Stu


1 Answers

If you always want to remove the 5th comma, try

sed 's/,//5' input.txt

But you are saying, it may have extra commas. You have to provide a logic how to find out if extra commas are there or not.

If you know the number of commas, you can use awk. This has proven to be quite an exercise, I am sure someone else will come up with a more elegant solution, but I'll share mine anyway:

awk -f script.awk input.txt

with script.awk:

BEGIN{
    FS=","
}
NF<=12{
    print $0
}
NF>12{
    for (i=1; i<=4; i++) printf $i FS
    for (j=0; j<2; j++){
        for (i=0; i<=(NF-12)/2; i++){
            printf $(i+5)
            if (i<(NF-12)/2) printf "_"
            else printf FS
        }
    }
    for (i=NF-5; i<=NF; i++) printf $i FS
    printf "n"
}

First we set the field separator to ,. If we count less or equal to 12 fields, everything's fine and we simply print the whole line. If there are more than 12 fields, we print first the first 4 fields (again with the field separator), and then we print twice field 5 (and field 6), but instead of printing the ,, we exchange it with _. In the end we print the remaining fields.

As I said, there is probably a more elegant solution to this. I wonder with what other people come up.

like image 91
pfnuesel Avatar answered Apr 01 '23 17:04

pfnuesel