Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combine two columns into new and print all columns

Tags:

awk

I want to combine columns 1 and 2 and add them as a new column in my data frame. Then I want to print all the old columns and the newly created column. I can combine the columns using the script below, but not sure how to print all columns, not only the combined:

awk ' { print $1 $2 "_" $NF } ' input_file

in

c1 c2 c3
12 1  12
4 4 57

out

c1 c2 c3 c4
12 1  12 12_1
4 4 57 4_4
like image 968
user2300940 Avatar asked Oct 23 '21 09:10

user2300940


People also ask

Can you combine two columns in Excel without losing data?

Join columns using the Merge Cells add-in for Excel With the Merge Cells add-in, you can combine data from several cells using any separator you like (e.g. space, comma, carriage return or line break). You can join values row by row, column by column or merge data from the selected cells into one without losing it.


4 Answers

Here is a Generic solution in awk. Just mention field numbers in awk variable named fields eg: 1,2,3,4,7,8(example) and it will add all fields values to last column. Written and tested in GNU awk should work in any awk.

awk -v fields="1,2" '
BEGIN{
  num=split(fields,arr,",")
  for(i=1;i<=num;i++){
   field[arr[i]]
  }
}
FNR==1{
  print
  next
}
{
  val=""
  for(i=1;i<=NF;i++){
    if(i in field){
      val=(val?val "_":"")$i
    }
  }
  print $0,val
}
'  Input_file
like image 87
RavinderSingh13 Avatar answered Dec 04 '22 08:12

RavinderSingh13


$ awk '{print $0, (NR>1 ? $1"_"$2 : "c4")}' file
c1 c2 c3 c4
12 1  12 12_1
4 4 57 4_4

or to get tab-separated output if your input is tab-separated:

$ awk 'BEGIN{FS=OFS="\t"} {print $0, (NR>1 ? $1"_"$2 : "c4")}' file
c1      c2      c3      c4
12      1       12      12_1
4       4       57      4_4

or if it isn't:

$ awk -v OFS='\t' '{$(NF+1)=(NR>1 ? $1"_"$2 : "c4")} 1' file
c1      c2      c3      c4
12      1       12      12_1
4       4       57      4_4
like image 33
Ed Morton Avatar answered Dec 04 '22 08:12

Ed Morton


If you want to print the _ between field 1 and 2, then the first output would be c1 c2 c3 c1_c2 instead of c1 c2 c3 c4

You can add a column at the end with the value of $1 and $2 and then print the whole line:

awk ' { $(NF+1) = $1"_"$2 }1' input_file

Output

c1 c2 c3 c1_c2
12 1 12 12_1
4 4 57 4_4

Or you can print the whole line followed by field $1 and $2

awk '{print $0, $1"_"$2}' input_file

Output

c1 c2 c3 c1_c2
12 1  12 12_1
4 4 57 4_4
like image 21
The fourth bird Avatar answered Dec 04 '22 08:12

The fourth bird


Another awk which at FNR==1 uses the field name in $NF to create the field name for the next field (c3 -> c4, c -> c1, etc):

$ awk '{
    printf "%s%s%s\n",
        $0,
        OFS,
        (FNR>1?$1 "_" $2:(match($3,/[0-9]+$/)?substr($3,1,RSTART-1) substr($3,RSTART)+1:$3 1))
}' file

Output:

c1 c2 c3 c4
12 1  12 12_1
4 4 57 4_4
like image 32
James Brown Avatar answered Dec 04 '22 09:12

James Brown