Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing multiple delimiters between outside delimiters on each line

Using awk or sed in a bash script, I need to remove comma separated delimiters that are located between an inner and outer delimiter. The problem is that wrong values ends up in the wrong columns, where only 3 columns are desired.

For example, I want to turn this:

2020/11/04,Test Account,569.00
2020/11/05,Test,Account,250.00
2020/11/05,More,Test,Accounts,225.00

Into this:

2020/11/04,Test Account,569.00
2020/11/05,Test Account,250.00
2020/11/05,More Test Accounts,225.00

I've tried to use a few things, testing regex: But I cannot find a solution to only select the commas in order to remove.

regexr sample

like image 593
A.J. Hart Avatar asked Nov 11 '20 12:11

A.J. Hart


3 Answers

Use this Perl one-liner:

perl -F',' -lane 'print join ",", $F[0], "@F[1 .. ($#F-1)]", $F[-1];' in.csv

The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array @F on whitespace or on the regex specified in -F option.
-F',' : Split into @F on comma, rather than on whitespace.

$F[0] : first element of the array @F (= first comma-delimited value).
$F[-1] : last element of @F.
@F[1 .. ($#F-1)] : elements of @F between the second from the start and the second from the end, inclusive.
"@F[1 .. ($#F-1)]" : the above elements, joined on blanks into a string.
join ",", ... : join the LIST "..." on a comma, and return the resulting string.

SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches

like image 173
Timur Shtatland Avatar answered Sep 20 '22 13:09

Timur Shtatland


awk -F, '{ printf "%s,",$1;for (i=2;i<=NF-2;i++) { printf "%s ",$i };printf "%s,%s\n",$(NF-1),$NF }' file

Using awk, print the first comma delimited field and then loop through the rest of the field up to the last but 2 field printing the field followed by a space. Then for the last 2 fields print the last but one field, a comma and then the last field.

like image 9
Raman Sailopal Avatar answered Oct 16 '22 10:10

Raman Sailopal


With GNU awk for the 3rd arg to match():

$ awk -v OFS=, '{
     match($0,/([^,]*),(.*),([^,]*)/,a)
     gsub(/,/," ",a[2])
     print a[1], a[2], a[3]
}' file
2020/11/04,Test Account,569.00
2020/11/05,Test Account,250.00
2020/11/05,More Test Accounts,225.00

or with any awk:

$ awk '
    BEGIN { FS=OFS="," }
    {
        n = split($0,a)
        gsub(/^[^,]*,|,[^,]*$/,"")
        gsub(/,/," ")
        print a[1], $0, a[n]
    }
' file
2020/11/04,Test Account,569.00
2020/11/05,Test Account,250.00
2020/11/05,More Test Accounts,225.00
like image 8
Ed Morton Avatar answered Oct 16 '22 10:10

Ed Morton