I am trying to calculate a range of percentiles (5th-99th) in Bash
for a text file that contains 5 values, one per line.
Input
34.5
32.2
33.7
30.4
31.8
Attempted Code
awk '{s[NR-1]=$1} END{print s[int(0.05-0.99)]}' input
Expected Output
99th 34.5
97th 34.4
95th 34.3
90th 34.2
80th 33.9
70th 33.4
60th 32.8
50th 32.2
40th 32.0
30th 31.9
20th 31.5
10th 31.0
5th 30.7
For calculation of percentile based on 5 values, one need to create a mapping between percentiles, and to interpolate between them. A process called 'Piecewise Linear function' (a.k.a. pwlf
).
F(100) = 34.5 F(75) = 33.7 F(50) = 32.2 F(25) = 31.8 F(0) = 30.4
Mapping of any other x in the range 0..100, require linear interpolation betweeh F(L), and F(H) - where L is the highest value >= x, and H=L+1.
awk '
#! /bin/env awk
# PWLF Interpolation function, take a value, and two arrays for X & Y
function pwlf(x, px, py) {
# Shortcut to calculate low index of X, >= p
p_l = 1+int(x/25)
p_h = p_l+1
x_l = px[p_l]
x_h = px[p_h]
y_l = py[p_l]
y_h = py[p_h]
#print "X=", x, p_l, p_h, x_l, x_h, y_l, y_h
return y_l+(y_h-y_l)*(x-x_l)/(x_h-x_l)
}
# Read f Input in yy array, setup xx
{ yy[n*25] = $1 ; n++ }
# Print the table
END {
# Sort values of yy
ny = asort(yy) ;
# Create xx array 0, 25, ..., 100
for (i=1 ; i<=ny ; i++) xx[i]=25*(i-1)
# Prepare list of requested results
ns = split("99 97 95 90 80 70 60 50 40 30 20 10 5", pv)
for (i=1 ; i<=ns ; i++) printf "%dth %.1f\n", pv[i], pwlf(pv[i], xx, yy) ;
}
' input
Technically a bash
script, but based on comments to OP, better to place the whole think into script.awk, and execute as one lines. Solution has the '#!' to invoke awk script.
/path/to/script.awk < input
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With