Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate Percentile(s) in Bash

Tags:

I am trying to calculate a range of percentiles (5th-99th) in Bash for a text file that contains 5 values, one per line.

Input

34.5
32.2
33.7
30.4
31.8

Attempted Code

awk '{s[NR-1]=$1} END{print s[int(0.05-0.99)]}' input

Expected Output

99th    34.5
97th    34.4
95th    34.3
90th    34.2
80th    33.9
70th    33.4
60th    32.8
50th    32.2
40th    32.0
30th    31.9
20th    31.5
10th    31.0
5th     30.7
like image 787
arnpry Avatar asked Nov 17 '19 15:11

arnpry


1 Answers

For calculation of percentile based on 5 values, one need to create a mapping between percentiles, and to interpolate between them. A process called 'Piecewise Linear function' (a.k.a. pwlf).

F(100) = 34.5 F(75) = 33.7 F(50) = 32.2 F(25) = 31.8 F(0) = 30.4

Mapping of any other x in the range 0..100, require linear interpolation betweeh F(L), and F(H) - where L is the highest value >= x, and H=L+1.

awk '
#! /bin/env awk
  # PWLF Interpolation function, take a value, and two arrays for X & Y
function pwlf(x, px, py) {
  # Shortcut to calculate low index of X, >= p
  p_l = 1+int(x/25)
  p_h = p_l+1
  x_l = px[p_l]
  x_h = px[p_h]
  y_l = py[p_l]
  y_h = py[p_h]
#print "X=", x, p_l, p_h, x_l, x_h, y_l, y_h
  return y_l+(y_h-y_l)*(x-x_l)/(x_h-x_l)
}

  # Read f Input in yy array, setup xx
{ yy[n*25] = $1  ; n++ }

  # Print the table
END {
  # Sort values of yy
  ny = asort(yy) ;
  # Create xx array 0, 25, ..., 100
  for (i=1 ; i<=ny ; i++) xx[i]=25*(i-1)

  # Prepare list of requested results
  ns = split("99 97 95 90 80 70 60 50 40 30 20 10 5", pv)
  for (i=1 ; i<=ns ; i++) printf "%dth %.1f\n",  pv[i], pwlf(pv[i], xx, yy) ;
}
' input

Technically a bash script, but based on comments to OP, better to place the whole think into script.awk, and execute as one lines. Solution has the '#!' to invoke awk script.

/path/to/script.awk < input 
like image 164
dash-o Avatar answered Oct 05 '22 00:10

dash-o