I have an in.csv
file with just one column in this fashion:
Sample
a_b_c
d_e_f
g_h_i
I want to change the field separator from _
to ,
and print the separate fields, but keeping the input column as is in the first column of the output file. I want to use awk
in principle.
This is what I have so far:
awk 'BEGIN {FS="_";OFS=","} {$1=$1}1' in.csv > out.csv
which gives me this
Sample
a,b,c
d,e,f
g,h,i
How can I output it like this instead, preserving the original column (renamed ID
)?
ID,group1,group2,group3
a_b_c,a,b,c
d_e_f,d,e,f
g_h_i,g,h,i
Note that input number of fields is variable, and the input Sample
line might be other name, or be empty, or even non-existent, but I would still want the output like that...
EDIT
After checking all the answers, I have to clarify here the input file above is just an example... the real files I have usually have more than 3 fields separated by _
(but I do not know how many beforehand) and countless lines, however I will try to make sure all the rows in a given file are consistent on the number of fields to "split".
The answers below do not seem to work when my files have more or less than exactly 3 fields to split per row, I would need a more universal one-liner if that is possible.
For the moment, to keep things simple, I'd rather not do anything to the header line and leave it as is.
This means that for this other example:
Some_header
a_b_c_1
d_e_f_2
g_h_i_3
I would like to obtain this:
Some_header
a_b_c_1,a,b,c,1
d_e_f_2,d,e,f,2
g_h_i_3,g,h,i,3
Optimally, the one-liner should deal with cases where there are rows with inconsistent fields, so from a file like this:
Some_header
a_b_c
d_e_f_2
g_h_i_3_4
I would like to obtain this:
Some_header
a_b_c,a,b,c
d_e_f_2,d,e,f,2
g_h_i_3_4,g,h,i,3,4
Isn't there a way to record the row with the _
in a variable, then split the variable by _
, and then print the variable and all its components separated by ,
? Sorry I thought this would be easier... Maybe it would be easier with a Perl
one-liner? Sorry not so proficient with one-liners... Thanks again!
Could you please try following, written and tested on shown samples only. This should work with any number of fields too tested it in https://ideone.com/fWgggq
awk '
BEGIN{
FS="_"
OFS=","
print "ID,group1,group2,group3"
}
FNR>1{
val=$0
$1=$1
print val,$0
}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section of program from here.
FS="_" ##Setting field separator as _ here,
OFS="," ##Setting OFS as comma here.
print "ID,group1,group2,group3" ##Printing header as per OP requirement here.
}
FNR>1{ ##Checking condition if this is greater than 1st line then do following.
val=$0 ##Store current line into var val here.
$1=$1 ##reassign first field to itself so that new OFS which is , is implemented to whole line.
print val,$0 ##Printing current new line here.
}' Input_file ##Mentioning Input_file name here.
Another one which, however, does not deal with the header line (atm anyway, left as an exercize etc.):
$ awk '
BEGIN {
FS="_" # set delimiters
OFS=","
}
{
for(i=0;i<=NF;i++) # loop from 0 to get $0
printf "%s%s",$i,(i==NF?ORS:OFS) # print dealing with OFS and EOL
}' file
Output:
Sample,Sample
a_b_c,a,b,c
d_e_f,d,e,f
g_h_i,g,h,i
Yet another one which does deal with different input files with variable number of groups picking the header count from the first data record (NR==2
):
$ awk '
BEGIN {
FS="_" # set delimiters
OFS=","
}
NR>=2 { # process only data records, not header
if(NR==2) # create the header
for(i=0;i<=NF;i++)
printf "%s%s",(i==0?"ID":"group" i),(i==NF?ORS:OFS)
for(i=0;i<=NF;i++) # loop from 0 to get $0
printf "%s%s",$i,(i==NF?ORS:OFS) # print dealing with OFS and ORS
}' file
Output:
ID,group1,group2,group3
a_b_c,a,b,c
d_e_f,d,e,f
g_h_i,g,h,i
And finally a short one using GNU awk:
$ awk '$0=$0 (gensub(/(^|_)/,",","g"))' file
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With