Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate percentiles if we have probability density function data as x and y

I have data extracted from a pdf graph where x represents incubation times and y is the density in a csv file. I would like to calculate the percentiles, such as 95%. I'm a bit confused, should I calculate the percentile using the x values only, i.e., using np.precentile(x, 0.95)?

data in plot: enter image description here

like image 292
sakurami Avatar asked Sep 16 '25 09:09

sakurami


1 Answers

Here is some code which uses np.trapz (as proposed by @pjs). We take x and y arrays, assume it is PDF so first we normalize it to 1, an then start searching backward till we hit 0.95 point. I've made up some multi-peak function

import numpy as np
import matplotlib.pyplot as plt

N = 1000

x = np.linspace(0.0, 6.0*np.pi, N)
y = np.sin(x/2.0)/x # construct some multi-peak function
y[0] = y[1]
y = np.abs(y)

plt.plot(x, y, 'r.')
plt.show()

# normalization
norm = np.trapz(y, x)
print(norm)

y = y/norm
print(np.trapz(y, x)) # after normalization

# now compute integral cutting right limit down by one
# with each iteration, stop as soon as we hit 0.95
for k in range(0, N):
    if k == 0:
        xx = x
        yy = y
    else:
        xx = x[0:-k]
        yy = y[0:-k]
    v = np.trapz(yy, xx)
    print(f"Integral {k} from {xx[0]} to {xx[-1]} is equal to {v}")
    if v <= 0.95:
        break
like image 166
Severin Pappadeux Avatar answered Sep 19 '25 03:09

Severin Pappadeux