Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

awk: change field separator keeping first column as is

Tags:

regex

field

awk

I have an in.csv file with just one column in this fashion:

Sample
a_b_c
d_e_f
g_h_i

I want to change the field separator from _ to , and print the separate fields, but keeping the input column as is in the first column of the output file. I want to use awk in principle.

This is what I have so far:

awk 'BEGIN {FS="_";OFS=","} {$1=$1}1' in.csv > out.csv

which gives me this

Sample
a,b,c
d,e,f
g,h,i

How can I output it like this instead, preserving the original column (renamed ID)?

ID,group1,group2,group3
a_b_c,a,b,c
d_e_f,d,e,f
g_h_i,g,h,i

Note that input number of fields is variable, and the input Sample line might be other name, or be empty, or even non-existent, but I would still want the output like that...

EDIT

After checking all the answers, I have to clarify here the input file above is just an example... the real files I have usually have more than 3 fields separated by _ (but I do not know how many beforehand) and countless lines, however I will try to make sure all the rows in a given file are consistent on the number of fields to "split".

The answers below do not seem to work when my files have more or less than exactly 3 fields to split per row, I would need a more universal one-liner if that is possible.

For the moment, to keep things simple, I'd rather not do anything to the header line and leave it as is.

This means that for this other example:

Some_header
a_b_c_1
d_e_f_2
g_h_i_3

I would like to obtain this:

Some_header
a_b_c_1,a,b,c,1
d_e_f_2,d,e,f,2
g_h_i_3,g,h,i,3

Optimally, the one-liner should deal with cases where there are rows with inconsistent fields, so from a file like this:

Some_header
a_b_c
d_e_f_2
g_h_i_3_4

I would like to obtain this:

Some_header
a_b_c,a,b,c
d_e_f_2,d,e,f,2
g_h_i_3_4,g,h,i,3,4

Isn't there a way to record the row with the _ in a variable, then split the variable by _, and then print the variable and all its components separated by ,? Sorry I thought this would be easier... Maybe it would be easier with a Perl one-liner? Sorry not so proficient with one-liners... Thanks again!

like image 230
DaniCee Avatar asked Dec 06 '22 08:12

DaniCee


2 Answers

Could you please try following, written and tested on shown samples only. This should work with any number of fields too tested it in https://ideone.com/fWgggq

awk '
BEGIN{
  FS="_"
  OFS=","
  print "ID,group1,group2,group3"
}
FNR>1{
  val=$0
  $1=$1
  print val,$0
}'  Input_file

Explanation: Adding detailed explanation for above.

awk '                                   ##Starting awk program from here.
BEGIN{                                  ##Starting BEGIN section of program from here.
  FS="_"                                ##Setting field separator as _ here,
  OFS=","                               ##Setting OFS as comma here.
  print "ID,group1,group2,group3"       ##Printing header as per OP requirement here.
}
FNR>1{                                  ##Checking condition if this is greater than 1st line then do following.
  val=$0                                ##Store current line into var val here.
  $1=$1                                 ##reassign first field to itself so that new OFS which is , is implemented to whole line.
  print val,$0                          ##Printing current new line here.
}'  Input_file                          ##Mentioning Input_file name here.
like image 97
RavinderSingh13 Avatar answered Dec 18 '22 17:12

RavinderSingh13


Another one which, however, does not deal with the header line (atm anyway, left as an exercize etc.):

$ awk '
BEGIN {
    FS="_"                                # set delimiters
    OFS=","
}
{
    for(i=0;i<=NF;i++)                    # loop from 0 to get $0
        printf "%s%s",$i,(i==NF?ORS:OFS)  # print dealing with OFS and EOL
}' file

Output:

Sample,Sample
a_b_c,a,b,c
d_e_f,d,e,f
g_h_i,g,h,i

Yet another one which does deal with different input files with variable number of groups picking the header count from the first data record (NR==2):

$ awk '
BEGIN {
    FS="_"                                # set delimiters
    OFS=","
}
NR>=2 {                                   # process only data records, not header
    if(NR==2)                             # create the header
        for(i=0;i<=NF;i++)
            printf "%s%s",(i==0?"ID":"group" i),(i==NF?ORS:OFS)
    for(i=0;i<=NF;i++)                    # loop from 0 to get $0
        printf "%s%s",$i,(i==NF?ORS:OFS)  # print dealing with OFS and ORS
}' file

Output:

ID,group1,group2,group3
a_b_c,a,b,c
d_e_f,d,e,f
g_h_i,g,h,i

And finally a short one using GNU awk:

$ awk '$0=$0 (gensub(/(^|_)/,",","g"))' file
like image 37
James Brown Avatar answered Dec 18 '22 17:12

James Brown