Add comma sepated values inside a column

Tags:

awk

Hi I have a file format (TSV) as like this

Name  type    Age     Weight       Height 
Xxx   M    12,34,23  50,30,60,70   4,5,6,5.5 
Yxx   F    21,14,32  40,50,20,40   3,4,5,5.5

I would like to add all the values in Age, Weight and Height and add a column after this, then so some percentage also, like Total_Height/Total_Weight (awk '$0=$0"\t"(NR==1?"Percentage":$8/$7)'). I have large data set and it is not possible to do with excel.

Like this

Name  type    Age     Weight       Height     Total_Age Total_Weight Total_Height Percentage
Xxx   M    12,34,23  50,30,60,70   4,5,6,5.5   69        210         20.5          0.097            
Yxx   F    21,14,32  40,50,20,40   3,4,5,5.5   67        150         17.5          0.11

928

asked Oct 11 '21 11:10

ZenMac

3 Answers

With your shown samples please try following code.

awk '
FNR==1{
  print $0,"Total_Age Total_Weight Total_Height Percentage"
  next
}
FNR>1{
  totAge=totWeight=totHeight=0
  split($3,tmp,",")
  for(i in tmp){
    totAge+=tmp[i]
  }
  split($4,tmp,",")
  for(i in tmp){
    totWeight+=tmp[i]
  }
  split($5,tmp,",")
  for(i in tmp){
    totHeight+=tmp[i]
  }
  $(NF+1)=totAge
  $(NF+1)=totWeight
  $(NF+1)=totHeight
  $(NF+1)=$(NF-1)==0?"N/A":$NF/$(NF-1)
}
1' Input_file | column -t

OR adding a bit short version of above awk code:

awk '
BEGIN{OFS="\t"}
FNR==1{
  print $0,"Total_Age Total_Weight Total_Height Percentage"
  next
}
FNR>1{
  totAge=totWeight=totHeight=0
  split($3,tmp,",")
  for(i in tmp){
    totAge+=tmp[i]
  }
  split($4,tmp,",")
  for(i in tmp){
    totWeight+=tmp[i]
  }
  split($5,tmp,",")
  for(i in tmp){
    totHeight+=tmp[i]
  }
  $(NF+1)=totAge OFS totWeight OFS totHeight
  $0=$0
  $(NF+1)=( $(NF-1)==0 ? "N/A" : $NF/$(NF-1) )
}
1' Input_file | column -t

Explanation: Simple explanation would be, take sum of 3rd, 4th and 5th columns and assign them to last column of line. Accordingly add column value which has divide value of last and 2nd last columns as per OP's request. Using column -t to make it look better on output.

168

answered Oct 24 '22 02:10

RavinderSingh13

Using any awk in any shell on every Unix box and without creating new fields in each record (which is inefficient as it causes awk to re-build the record every time you change a field) and without updating the input record (which is inefficient as it causes awk to re-split the record into fields every time you change the record) and designed to work for any number of value input columns in any order:

$ cat tst.awk
BEGIN { FS=OFS="\t" }
{ printf "%s%s", $0, OFS }
NR==1 {
    for (i=3; i<=NF; i++) {
        printf "Total_%s%s", $i, OFS
        tags[i] = $i
    }
    print "Percentage"
    next
}
{
    delete tot
    for (i=3; i<=NF; i++) {
        tag = tags[i]
        n = split($i,vals,",")
        for (j in vals) {
            tot[tag] += vals[j]
        }
        printf "%s%s", tot[tag], OFS
    }
    printf "%0.3f%s", (tot["Weight"] ? tot["Height"] / tot["Weight"] : 0), ORS
}

$ awk -f tst.awk file
Name    type    Age     Weight  Height  Total_Age       Total_Weight    Total_Height    Percentage
Xxx     M       12,34,23        50,30,60,70     4,5,6,5.5       69      210     20.5    0.098
Yxx     F       21,14,32        40,50,20,40     3,4,5,5.5       67      150     17.5    0.117

$ awk -f tst.awk file | column -t
Name  type  Age       Weight       Height     Total_Age  Total_Weight  Total_Height  Percentage
Xxx   M     12,34,23  50,30,60,70  4,5,6,5.5  69         210           20.5          0.098
Yxx   F     21,14,32  40,50,20,40  3,4,5,5.5  67         150           17.5          0.117

To show the functional advantages of the above approach, imagine you need to add more values like ShoeSize and/or rearrange the order of the columns, e.g.:

$ column -t file
Name  type  ShoeSize  Height     Age       Weight
Xxx   M     12,8,10   4,5,6,5.5  12,34,23  50,30,60,70
Yxx   F     9,7,8     3,4,5,5.5  21,14,32  40,50,20,40

Now run the above script and notice you get Total_ columns added for every original column and you still get the same Percentage column of Height/Weight added to the end:

$ awk -f tst.awk file | column -t
Name  type  ShoeSize  Height     Age       Weight       Total_ShoeSize  Total_Height  Total_Age  Total_Weight  Percentage
Xxx   M     12,8,10   4,5,6,5.5  12,34,23  50,30,60,70  30              20.5          69         210           0.098
Yxx   F     9,7,8     3,4,5,5.5  21,14,32  40,50,20,40  24              17.5          67         150           0.117

answered Oct 24 '22 02:10

Ed Morton

If you have to do the same operations multiple times, you might also use a function to sum the array values (given that the values are numbers separated by comma's).

Reusing some parts of the answer from @RavinderSingh13 and a massive thank you to @Ed Morton taking the time to provide great feedback improving the code:

awk '
function arraySum(field,      sum,arr,i) {
  split(field,arr,",")
  for (i in arr) sum += arr[i]
  return sum
}
FNR==1{
  print $0, "Total_Age", "Total_Weight", "Total_Height", "Percentage"
  next
}
NR > 1 {
  sumWeight = arraySum($4)
  sumHeight = arraySum($5)
  print $0, arraySum($3), sumWeight, sumHeight, (sumWeight ? sumHeight/sumWeight : 0)
}' file | column -t

Output

Name  type  Age       Weight       Height     Total_Age  Total_Weight  Total_Height  Percentage
Xxx   M     12,34,23  50,30,60,70  4,5,6,5.5  69         210           20.5          0.097619
Yxx   F     21,14,32  40,50,20,40  3,4,5,5.5  67         150           17.5          0.116667

answered Oct 24 '22 00:10

The fourth bird

Related questions
                            
                                Inheritance - __hash__ sets to None in a subclass
                            
                                Python-pandas: the truth value of a series is ambiguous
                            
                                How to run multiple Python scripts and an executable files using Docker?
                            
                                Why does Python sort put upper case items first?
                            
                                Dask item assignment. Cannot use loc for item assignment
                            
                                Django model fields with dynamic names
                            
                                Flask restful pagination
                            
                                Replace <br> with space in BeautifulSoap output
                            
                                Firebase Firestore: get generated ID of document (Python)
                            
                                Pip install killed - out of memory - how to get around it?
                            
                                Datetime issue with matplotlib
                            
                                module 'pandas' has no attribute 'tslib'
                            
                                numpy.polyfit vs numpy.polynomial.polynomial.polyfit
                            
                                Chrome 80 how to decode cookies
                            
                                Iterating over dictionary using __getitem__ in python [duplicate]
                            
                                FastAPI variable query parameters
                            
                                Input image dtype is bool. Interpolation is not defined with bool data type
                            
                                Plot Confusion Matrix for multilabel Classifcation Python
                            
                                Mysql 'VALUES function' is deprecated
                            
                                Python Walrus Operator in While Loops

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With