Hi I have a file format (TSV) as like this
Name type Age Weight Height
Xxx M 12,34,23 50,30,60,70 4,5,6,5.5
Yxx F 21,14,32 40,50,20,40 3,4,5,5.5
I would like to add all the values in Age, Weight and Height and add a column after this, then so some percentage also, like Total_Height/Total_Weight (awk '$0=$0"\t"(NR==1?"Percentage":$8/$7)'). I have large data set and it is not possible to do with excel.
Like this
Name type Age Weight Height Total_Age Total_Weight Total_Height Percentage
Xxx M 12,34,23 50,30,60,70 4,5,6,5.5 69 210 20.5 0.097
Yxx F 21,14,32 40,50,20,40 3,4,5,5.5 67 150 17.5 0.11
Use the String. split() method to convert a comma separated string to an array, e.g. const arr = str. split(',') . The split() method will split the string on each occurrence of a comma and will return an array containing the results.
2. In the Split Cells dialog box, select Split to Rows or Split to Columns in the Type section as you need. And in the Specify a separator section, select the Other option, enter the comma symbol into the textbox, and then click the OK button. See screenshot:
In some cases, you may need to split comma separated values of selected cells into rows or columns. Here we will introduce three methods for you to solve this problem in details. You can apply the Text to Columns function to split comma separated values of selected cells into columns. Please do as follows. 1.
Now if a field has a comma(,) in its value, obviously it treats the rest of the field value as the next column and the copy aborts since it is having one extra column than expected. How to escape the comma(,) in between the field value?
If you have access to Microsoft Excel 365, you can use the TEXTJOIN function to join the cell values of a column or range to make a comma-separated list. First, we have to write down the below formula in cell C5.
With your shown samples please try following code.
awk '
FNR==1{
print $0,"Total_Age Total_Weight Total_Height Percentage"
next
}
FNR>1{
totAge=totWeight=totHeight=0
split($3,tmp,",")
for(i in tmp){
totAge+=tmp[i]
}
split($4,tmp,",")
for(i in tmp){
totWeight+=tmp[i]
}
split($5,tmp,",")
for(i in tmp){
totHeight+=tmp[i]
}
$(NF+1)=totAge
$(NF+1)=totWeight
$(NF+1)=totHeight
$(NF+1)=$(NF-1)==0?"N/A":$NF/$(NF-1)
}
1' Input_file | column -t
OR adding a bit short version of above awk
code:
awk '
BEGIN{OFS="\t"}
FNR==1{
print $0,"Total_Age Total_Weight Total_Height Percentage"
next
}
FNR>1{
totAge=totWeight=totHeight=0
split($3,tmp,",")
for(i in tmp){
totAge+=tmp[i]
}
split($4,tmp,",")
for(i in tmp){
totWeight+=tmp[i]
}
split($5,tmp,",")
for(i in tmp){
totHeight+=tmp[i]
}
$(NF+1)=totAge OFS totWeight OFS totHeight
$0=$0
$(NF+1)=( $(NF-1)==0 ? "N/A" : $NF/$(NF-1) )
}
1' Input_file | column -t
Explanation: Simple explanation would be, take sum of 3rd, 4th and 5th columns and assign them to last column of line. Accordingly add column value which has divide value of last and 2nd last columns as per OP's request. Using column -t
to make it look better on output.
Using any awk in any shell on every Unix box and without creating new fields in each record (which is inefficient as it causes awk to re-build the record every time you change a field) and without updating the input record (which is inefficient as it causes awk to re-split the record into fields every time you change the record) and designed to work for any number of value input columns in any order:
$ cat tst.awk
BEGIN { FS=OFS="\t" }
{ printf "%s%s", $0, OFS }
NR==1 {
for (i=3; i<=NF; i++) {
printf "Total_%s%s", $i, OFS
tags[i] = $i
}
print "Percentage"
next
}
{
delete tot
for (i=3; i<=NF; i++) {
tag = tags[i]
n = split($i,vals,",")
for (j in vals) {
tot[tag] += vals[j]
}
printf "%s%s", tot[tag], OFS
}
printf "%0.3f%s", (tot["Weight"] ? tot["Height"] / tot["Weight"] : 0), ORS
}
$ awk -f tst.awk file
Name type Age Weight Height Total_Age Total_Weight Total_Height Percentage
Xxx M 12,34,23 50,30,60,70 4,5,6,5.5 69 210 20.5 0.098
Yxx F 21,14,32 40,50,20,40 3,4,5,5.5 67 150 17.5 0.117
$ awk -f tst.awk file | column -t
Name type Age Weight Height Total_Age Total_Weight Total_Height Percentage
Xxx M 12,34,23 50,30,60,70 4,5,6,5.5 69 210 20.5 0.098
Yxx F 21,14,32 40,50,20,40 3,4,5,5.5 67 150 17.5 0.117
To show the functional advantages of the above approach, imagine you need to add more values like ShoeSize
and/or rearrange the order of the columns, e.g.:
$ column -t file
Name type ShoeSize Height Age Weight
Xxx M 12,8,10 4,5,6,5.5 12,34,23 50,30,60,70
Yxx F 9,7,8 3,4,5,5.5 21,14,32 40,50,20,40
Now run the above script and notice you get Total_
columns added for every original column and you still get the same Percentage
column of Height/Weight added to the end:
$ awk -f tst.awk file | column -t
Name type ShoeSize Height Age Weight Total_ShoeSize Total_Height Total_Age Total_Weight Percentage
Xxx M 12,8,10 4,5,6,5.5 12,34,23 50,30,60,70 30 20.5 69 210 0.098
Yxx F 9,7,8 3,4,5,5.5 21,14,32 40,50,20,40 24 17.5 67 150 0.117
If you have to do the same operations multiple times, you might also use a function to sum the array values (given that the values are numbers separated by comma's).
Reusing some parts of the answer from @RavinderSingh13 and a massive thank you to @Ed Morton taking the time to provide great feedback improving the code:
awk '
function arraySum(field, sum,arr,i) {
split(field,arr,",")
for (i in arr) sum += arr[i]
return sum
}
FNR==1{
print $0, "Total_Age", "Total_Weight", "Total_Height", "Percentage"
next
}
NR > 1 {
sumWeight = arraySum($4)
sumHeight = arraySum($5)
print $0, arraySum($3), sumWeight, sumHeight, (sumWeight ? sumHeight/sumWeight : 0)
}' file | column -t
Output
Name type Age Weight Height Total_Age Total_Weight Total_Height Percentage
Xxx M 12,34,23 50,30,60,70 4,5,6,5.5 69 210 20.5 0.097619
Yxx F 21,14,32 40,50,20,40 3,4,5,5.5 67 150 17.5 0.116667
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With