Given this sample input:
ID Sample1 Sample2 Sample3 One 10 0 5 Two 3 6 8 Three 3 4 7
I needed to produce this output using AWK:
ID Sample1 Sample2 Sample3 One 62.50 0.00 25.00 Two 18.75 60.00 40.00 Three 18.75 40.00 35.00
This is how I solved it:
function percent(value, total) {
return sprintf("%.2f", 100 * value / total)
}
{
label[NR] = $1
for (i = 2; i <= NF; ++i) {
sum[i] += col[i][NR] = $i
}
}
END {
title = label[1]
for (i = 2; i <= length(col) + 1; ++i) {
title = title "\t" col[i][1]
}
print title
for (j = 2; j <= NR; ++j) {
line = label[j]
for (i = 2; i <= length(col) + 1; ++i) {
line = line "\t" percent(col[i][j], sum[i])
}
print line
}
}
This works fine in GNU AWK (awk
in Linux, gawk
in BSD),
but not in BSD AWK, where I get this error:
$ awk -f script.awk sample.txt awk: syntax error at source line 7 source file script.awk context is sum[i] += >>> col[i][ <<< awk: illegal statement at source line 7 source file script.awk awk: illegal statement at source line 7 source file script.awk
It seems the problem is with the multidimensional arrays. I'd like to make this script work in BSD AWK too, so it's more portable.
Is there a way to change this to make it work in BSD AWK?
Try using the pseudo-2-dimensional form. Instead of
col[i][NR]
use
col[i,NR]
That is a 1-dimensional array, the key is the concatenated string: i SUBSEP NR
@glenn's answer got me on the right path. It took a bit more work though:
col[i, NR]
made dealing with the column titles troublesome. It helped a lot to remove the buffering of the column titles and print them immediately after readinglength(col) + 1
was no longer usable in the final loop condition, as using col[i, j]
made the loops infinite. As a workaround, I could replace length(col) + 1
with simply NF
Here's the final implementation, which now works in both GNU and BSD version of AWK:
function percent(value, total) {
return sprintf("%.2f", 100 * value / total)
}
BEGIN { OFS = "\t" }
NR == 1 { gsub(/ +/, OFS); print }
NR != 1 {
label[NR] = $1
for (i = 2; i <= NF; ++i) {
sum[i] += col[i, NR] = $i
}
}
END {
for (j = 2; j <= NR; ++j) {
line = label[j]
for (i = 2; i <= NF; ++i) {
line = line OFS percent(col[i, j], sum[i])
}
print line
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With