Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

obtaining the critical values needed for the kolmogorov-smirnov test

I'm talking about retrieving the values of this table media a python formula

https://www.soest.hawaii.edu/GG/FACULTY/ITO/GG413/K_S_Table_one_Sample.pdf

i've been looking for a while but scipy functions do not look for this value and tbh I'm getting pretty confused over here.

I've been looking inside scipy built in formulas, without success. For example, in the aforementioned table, D[0.1, 10] == 0.36866. Yet scipy.stats.kstest does NOT return this same value, no matter how much do I play with my data.

like image 301
ayy_chemixd Avatar asked Oct 17 '22 10:10

ayy_chemixd


1 Answers

This can be done with scipy, using the ksone distribution and its ppf (percent point function) method, rather than the kstest:

from scipy.stats import ksone

def ks_critical_value(n_trials, alpha):
    return ksone.ppf(1-alpha/2, n_trials)

Printing a table of critical values:

from __future__ import print_function # For Python 2

trials = range(1, 41)
alphas = [0.1, 0.05, 0.02, 0.01]

# Print table headers
print('{:<6}|{:<6} Level of significance, alpha'.format(' ', ' '))
print('{:<6}|{:>8} {:>8} {:>8} {:>8}'.format(*['Trials'] + alphas))
print('-' * 42)
# Print critical values for each n_trials x alpha combination
for t in trials:
    print('{:6d}|{:>8.5f} {:>8.5f} {:>8.5f} {:>8.5f}'
          .format(*[t] + [ks_critical_value(t, a) for a in alphas]))
    if t % 10 == 0:
        print()

Partial output:

      |       Level of significance, alpha
Trials|     0.1     0.05     0.02     0.01
------------------------------------------
     1|     nan      nan      nan      nan
     2| 0.77639  0.84189      nan      nan
     3| 0.63604  0.70760  0.78456  0.82900
     4| 0.56522  0.62394  0.68887  0.73424
     5| 0.50945  0.56328  0.62718  0.66853
     6| 0.46799  0.51926  0.57741  0.61661
     7| 0.43607  0.48342  0.53844  0.57581
     8| 0.40962  0.45427  0.50654  0.54179
     9| 0.38746  0.43001  0.47960  0.51332
    10| 0.36866  0.40925  0.45662  0.48893

    11| 0.35242  0.39122  0.43670  0.46770
    12| 0.33815  0.37543  0.41918  0.44905
    13| 0.32549  0.36143  0.40362  0.43247
    14| 0.31417  0.34890  0.38970  0.41762
    15| 0.30397  0.33760  0.37713  0.40420
    16| 0.29472  0.32733  0.36571  0.39201
    17| 0.28627  0.31796  0.35528  0.38086
    18| 0.27851  0.30936  0.34569  0.37062
    19| 0.27136  0.30143  0.33685  0.36117
    20| 0.26473  0.29408  0.32866  0.35241

We need some additional feedback from a statistician on (a) why we get np.nan values for the top two rows (I assume because the critical values for these combinations of n_trials and alpha are purely theoretical, and not achievable in practice), and (b) why the ksone.ppf method needs alpha to be divided by 2? I will edit this answer to include that information.

You can see though, that besides the initial missing values, this table generates identical results to the table in your question, and to the table on page 16 of this paper.

like image 182
ajrwhite Avatar answered Oct 20 '22 22:10

ajrwhite