Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add comma sepated values inside a column

Tags:

python

awk

Hi I have a file format (TSV) as like this

Name  type    Age     Weight       Height 
Xxx   M    12,34,23  50,30,60,70   4,5,6,5.5 
Yxx   F    21,14,32  40,50,20,40   3,4,5,5.5

I would like to add all the values in Age, Weight and Height and add a column after this, then so some percentage also, like Total_Height/Total_Weight (awk '$0=$0"\t"(NR==1?"Percentage":$8/$7)'). I have large data set and it is not possible to do with excel.

Like this

Name  type    Age     Weight       Height     Total_Age Total_Weight Total_Height Percentage
Xxx   M    12,34,23  50,30,60,70   4,5,6,5.5   69        210         20.5          0.097            
Yxx   F    21,14,32  40,50,20,40   3,4,5,5.5   67        150         17.5          0.11 
like image 928
ZenMac Avatar asked Oct 11 '21 11:10

ZenMac


People also ask

How do you add comma separated values in an array?

Use the String. split() method to convert a comma separated string to an array, e.g. const arr = str. split(',') . The split() method will split the string on each occurrence of a comma and will return an array containing the results.

How to insert comma between rows and columns in Excel?

2. In the Split Cells dialog box, select Split to Rows or Split to Columns in the Type section as you need. And in the Specify a separator section, select the Other option, enter the comma symbol into the textbox, and then click the OK button. See screenshot:

How to split comma separated values of selected cells into columns?

In some cases, you may need to split comma separated values of selected cells into rows or columns. Here we will introduce three methods for you to solve this problem in details. You can apply the Text to Columns function to split comma separated values of selected cells into columns. Please do as follows. 1.

What happens when a field has a comma in its value?

Now if a field has a comma(,) in its value, obviously it treats the rest of the field value as the next column and the copy aborts since it is having one extra column than expected. How to escape the comma(,) in between the field value?

How to make a comma-separated list of cells in Excel?

If you have access to Microsoft Excel 365, you can use the TEXTJOIN function to join the cell values of a column or range to make a comma-separated list. First, we have to write down the below formula in cell C5.


3 Answers

With your shown samples please try following code.

awk '
FNR==1{
  print $0,"Total_Age Total_Weight Total_Height Percentage"
  next
}
FNR>1{
  totAge=totWeight=totHeight=0
  split($3,tmp,",")
  for(i in tmp){
    totAge+=tmp[i]
  }
  split($4,tmp,",")
  for(i in tmp){
    totWeight+=tmp[i]
  }
  split($5,tmp,",")
  for(i in tmp){
    totHeight+=tmp[i]
  }
  $(NF+1)=totAge
  $(NF+1)=totWeight
  $(NF+1)=totHeight
  $(NF+1)=$(NF-1)==0?"N/A":$NF/$(NF-1)
}
1' Input_file | column -t

OR adding a bit short version of above awk code:

awk '
BEGIN{OFS="\t"}
FNR==1{
  print $0,"Total_Age Total_Weight Total_Height Percentage"
  next
}
FNR>1{
  totAge=totWeight=totHeight=0
  split($3,tmp,",")
  for(i in tmp){
    totAge+=tmp[i]
  }
  split($4,tmp,",")
  for(i in tmp){
    totWeight+=tmp[i]
  }
  split($5,tmp,",")
  for(i in tmp){
    totHeight+=tmp[i]
  }
  $(NF+1)=totAge OFS totWeight OFS totHeight
  $0=$0
  $(NF+1)=( $(NF-1)==0 ? "N/A" : $NF/$(NF-1) )
}
1' Input_file | column -t

Explanation: Simple explanation would be, take sum of 3rd, 4th and 5th columns and assign them to last column of line. Accordingly add column value which has divide value of last and 2nd last columns as per OP's request. Using column -t to make it look better on output.

like image 168
RavinderSingh13 Avatar answered Oct 24 '22 02:10

RavinderSingh13


Using any awk in any shell on every Unix box and without creating new fields in each record (which is inefficient as it causes awk to re-build the record every time you change a field) and without updating the input record (which is inefficient as it causes awk to re-split the record into fields every time you change the record) and designed to work for any number of value input columns in any order:

$ cat tst.awk
BEGIN { FS=OFS="\t" }
{ printf "%s%s", $0, OFS }
NR==1 {
    for (i=3; i<=NF; i++) {
        printf "Total_%s%s", $i, OFS
        tags[i] = $i
    }
    print "Percentage"
    next
}
{
    delete tot
    for (i=3; i<=NF; i++) {
        tag = tags[i]
        n = split($i,vals,",")
        for (j in vals) {
            tot[tag] += vals[j]
        }
        printf "%s%s", tot[tag], OFS
    }
    printf "%0.3f%s", (tot["Weight"] ? tot["Height"] / tot["Weight"] : 0), ORS
}

$ awk -f tst.awk file
Name    type    Age     Weight  Height  Total_Age       Total_Weight    Total_Height    Percentage
Xxx     M       12,34,23        50,30,60,70     4,5,6,5.5       69      210     20.5    0.098
Yxx     F       21,14,32        40,50,20,40     3,4,5,5.5       67      150     17.5    0.117

$ awk -f tst.awk file | column -t
Name  type  Age       Weight       Height     Total_Age  Total_Weight  Total_Height  Percentage
Xxx   M     12,34,23  50,30,60,70  4,5,6,5.5  69         210           20.5          0.098
Yxx   F     21,14,32  40,50,20,40  3,4,5,5.5  67         150           17.5          0.117

To show the functional advantages of the above approach, imagine you need to add more values like ShoeSize and/or rearrange the order of the columns, e.g.:

$ column -t file
Name  type  ShoeSize  Height     Age       Weight
Xxx   M     12,8,10   4,5,6,5.5  12,34,23  50,30,60,70
Yxx   F     9,7,8     3,4,5,5.5  21,14,32  40,50,20,40

Now run the above script and notice you get Total_ columns added for every original column and you still get the same Percentage column of Height/Weight added to the end:

$ awk -f tst.awk file | column -t
Name  type  ShoeSize  Height     Age       Weight       Total_ShoeSize  Total_Height  Total_Age  Total_Weight  Percentage
Xxx   M     12,8,10   4,5,6,5.5  12,34,23  50,30,60,70  30              20.5          69         210           0.098
Yxx   F     9,7,8     3,4,5,5.5  21,14,32  40,50,20,40  24              17.5          67         150           0.117
like image 5
Ed Morton Avatar answered Oct 24 '22 02:10

Ed Morton


If you have to do the same operations multiple times, you might also use a function to sum the array values (given that the values are numbers separated by comma's).

Reusing some parts of the answer from @RavinderSingh13 and a massive thank you to @Ed Morton taking the time to provide great feedback improving the code:

awk '
function arraySum(field,      sum,arr,i) {
  split(field,arr,",")
  for (i in arr) sum += arr[i]
  return sum
}
FNR==1{
  print $0, "Total_Age", "Total_Weight", "Total_Height", "Percentage"
  next
}
NR > 1 {
  sumWeight = arraySum($4)
  sumHeight = arraySum($5)
  print $0, arraySum($3), sumWeight, sumHeight, (sumWeight ? sumHeight/sumWeight : 0)
}' file | column -t

Output

Name  type  Age       Weight       Height     Total_Age  Total_Weight  Total_Height  Percentage
Xxx   M     12,34,23  50,30,60,70  4,5,6,5.5  69         210           20.5          0.097619
Yxx   F     21,14,32  40,50,20,40  3,4,5,5.5  67         150           17.5          0.116667
like image 3
The fourth bird Avatar answered Oct 24 '22 00:10

The fourth bird